Fi
nBER
T
:
Fi
n
a
n
c
i
a
l Se
n
ti
m
e
n
t
A
n
a
l
y
sis
w
it
h
Pre
-
tr
a
i
n
e
d
L
a
ngu
a
g
e Mo
d
els
s
u
b
m
itte
d
i
n
p
a
rti
a
l f
u
lfill
m
e
n
t f
o
r t
h
e
d
e
g
ree
o
f
m
a
ster
o
f s
c
ie
n
c
e
Ti
t
le,
N
a
m
e
Afilia
t
io
n
E
m
ail
I
n
t
er
n
al
Sup
er
v
i
s
or
Dr
P
e
ng
j
i
e
R
e
n
U
v
A
,
I
LPS
p
.
r
e
n@u
v
a.
n
l
Ex
t
er
n
al
Sup
er
v
i
s
or
Dr
Zu
lk
uf
G
e
nc
N
a
sp
e
r
s
Gr
o
up
z
u
lk
uf
.
g
e
nc@n
a
sp
e
r
s
.
c
o
m
arXiv:1908.10063v1
[cs.CL]
Dogu
A
r
ac
i
12255068
m
a
ster i
n
f
o
r
m
a
ti
on
st
ud
ies
d
a
t
a
s
c
ie
n
c
e
f
ac
u
lt
y
o
f s
c
ie
n
c
e
un
iversit
y
o
f
a
m
ster
d
a
m
2019
-
06
-
25
27 Aug 2019
Fi
n
BERT: Fi
n
a
n
c
i
a
l
S
e
n
t
i
m
e
n
t
A
n
a
ly
s
i
s
w
i
t
h
P
re
-t
r
a
i
n
e
d
L
a
n
g
ua
ge
Mod
el
s
D
o
gu T
a
n
A
r
a
c
i
dogu
.
a
r
ac
i
@s
t
ud
e
n
t.
u
v
a
.
n
l
Un
i
v
e
r
s
i
t
y
of
A
ms
te
r
dam
A
ms
te
r
dam
,
Th
e
N
et
h
e
rl
ands
A
B
S
TRACT
F
inancia
l
s
e
ntim
e
nt ana
ly
sis is a cha
ll
e
n
g
in
g
tas
k
du
e
to th
e
sp
e
-
c
i
a
l
i
z
ed
l
a
ngu
a
g
e
a
n
d
l
a
c
k
o
f
l
abe
l
ed
data
i
n
t
h
at
do
m
a
i
n
.
G
e
n
e
r
a
l-
pu
r
pos
e
mod
e
l
s a
r
e
not
e
ff
e
cti
v
e
e
nou
g
h
be
caus
e
of sp
e
cia
l
i
z
e
d
l
a
ngu
a
g
e
us
ed
i
n
f
i
n
a
nc
i
a
l
c
o
n
te
x
t
.
W
e
h
y
p
ot
h
e
s
i
z
e
t
h
at
pr
e
-
t
r
a
i
n
ed
l
a
n
g
u
a
g
e
m
o
d
e
l
s c
a
n h
e
l
p w
i
t
h
t
h
i
s p
r
ob
l
e
m
be
c
a
us
e
t
h
e
y
r
eq
u
i
r
e
f
e
w
e
r
l
abe
l
ed
e
x
a
mp
l
e
s
a
n
d
t
h
e
y
c
a
n
be
fur
t
h
e
r
t
r
a
i
n
ed
o
n
do
m
a
i
n
-
sp
e
c
i
f
ic co
r
po
r
a
.
W
e
i
n
t
r
oduc
e
F
i
n
B
E
R
T
,
a
l
an
g
ua
g
e
mod
e
l
b
as
e
d
o
n B
E
R
T
,
to
ta
c
kl
e
NLP
ta
s
k
s
i
n
f
i
n
a
nc
i
a
l
do
m
a
i
n
.
Our r
e
su
l
t
s sh
o
w
imp
r
o
v
e
m
e
nt in
e
v
e
r
y
m
e
asu
r
e
d m
e
t
r
ic on cu
rr
e
nt stat
e
-
of
-
th
e
-
a
r
t
r
e
su
l
ts fo
r
t
w
o
f
inancia
l
s
e
ntim
e
nt ana
ly
sis datas
e
ts
.
W
e
f
ind
t
h
at
e
v
e
n w
i
t
h
a
sm
a
ll
e
r
t
r
a
i
n
i
ng s
et
a
n
d
f
in
e
-
t
un
i
ng
o
n
ly
a
p
a
r
t
o
f
t
h
e
m
ode
l
,
F
i
nB
E
R
T
o
u
t
p
e
rf
o
rms s
tate
-
o
f
-
t
h
e
-
a
r
t
m
a
ch
i
n
e
l
ea
rn
i
ng
m
et
h
o
ds
.
1I
N
TR
O
D
U
CTI
ON
Pr
ic
e
s in an op
e
n ma
rk
e
t
r
ef
l
e
cts a
ll
of th
e
a
v
ai
l
a
b
l
e
info
r
mation
r
e
g
a
r
d
i
n
g
a
ss
et
s
e
x
ch
a
n
g
e
d
i
n
a
n
e
c
o
n
o
m
y
[
16
]
.
W
h
e
n n
e
w
i
nf
o
r-
ma
t
i
on
be
com
e
s a
v
a
il
a
b
l
e,
a
ll
ac
t
o
r
s
i
n
t
h
e
e
conom
y
upda
te
t
h
e
i
r
pos
i
t
i
ons and p
r
i
c
e
s ad
j
us
t
acco
r
d
i
n
g
l
y
,
wh
i
ch ma
k
e
s
be
a
t
i
n
g
t
h
e
m
a
r
k
et
s c
o
ns
i
s
te
n
t
ly
i
mp
o
ss
i
b
l
e
.
H
o
w
e
v
e
r
,
t
h
e
def
i
n
i
t
i
o
n
o
f
"
n
e
w
i
n
-
f
o
rm
at
i
o
n
"
m
i
gh
t
ch
a
ng
e
a
s n
e
w
i
nf
o
rm
at
i
o
n r
et
r
i
e
v
a
l
te
chn
o
l
o
g
i
e
s
be
com
e
a
v
ai
l
a
b
l
e
and
e
a
r
ly
-
adoption of such t
e
chno
l
o
g
i
e
s mi
g
ht
p
r
o
v
i
d
e
a
n
a
d
v
a
n
ta
g
e
i
n
t
h
e
sh
o
r
t
-
te
r
m
.
A
n
a
ly
s
i
s
o
f
f
in
a
nc
i
a
l
te
x
t
s
,
be
i
t
n
e
ws
,
a
n
a
ly
s
t
r
e
p
o
r
t
s
o
r
o
fic
i
a
l
c
o
mp
a
n
y
a
nn
o
unc
e
m
e
n
t
s
i
s
a
p
o
ss
i
b
l
e
s
o
u
r
c
e
o
f n
e
w
i
nf
o
r
m
at
i
o
n
.
W
i
t
h unp
r
e
c
e
d
e
n
te
d
a
m
o
un
t
o
f such
te
x
t
be
i
ng c
r
eate
d
e
v
e
ry
d
a
y
,
manua
lly
ana
ly
z
in
g
th
e
s
e
and d
e
r
i
v
in
g
actiona
b
l
e
insi
g
hts f
r
om
th
e
m is too
b
i
g
of a tas
k
fo
r
an
y
sin
g
l
e
e
ntit
y
.
H
e
nc
e,
automat
e
d
s
e
ntim
e
nt o
r
po
l
a
r
it
y
ana
ly
sis of t
e
x
ts p
r
oduc
e
d
b
y
f
inancia
l
ac
-
to
r
s us
i
ng n
at
u
r
a
l
l
a
ngu
a
g
e
p
r
o
c
e
ss
i
ng (
N
LP) m
et
h
o
ds h
a
s g
a
i
n
e
d
p
o
pu
l
a
r
i
t
y
du
r
i
ng
t
h
e
l
a
s
t
d
e
c
a
d
e
[
4
]
.
Th
e
p
r
incipa
l
r
e
s
e
a
r
ch int
e
r
e
st fo
r
this th
e
sis is th
e
po
l
a
r
it
y
ana
l
y
s
i
s
,
wh
i
ch
i
s c
l
ass
i
f
y
i
n
g
te
x
t
as pos
i
t
i
v
e,
n
e
g
a
t
i
v
e
o
r
n
e
u
t
r
a
l
,
i
n
a
sp
e
c
i
f
ic dom
a
i
n
.
I
t
r
e
qu
i
r
e
s
t
o
a
dd
r
e
ss
t
wo ch
a
ll
e
n
g
e
s
:
1
) Th
e
m
o
s
t
s
o
ph
i
s
t
i
c
ate
d c
l
a
ss
i
f
ic
at
i
o
n m
et
h
o
ds
t
h
at
m
a
k
e
us
e
o
f n
e
u
r
a
l
n
e
ts
r
e
qui
r
e
v
ast amounts of
l
a
be
l
e
d data and
l
a
be
l
in
g
f
inancia
l
t
e
x
t
snipp
e
ts
r
e
qui
r
e
s cost
ly
e
x
p
e
r
tis
e.
2
) Th
e
s
e
ntim
e
nt ana
ly
sis
m
ode
l
s
t
r
a
i
n
edo
ng
e
n
e
r
a
l
c
o
rp
o
r
aa
r
e
n
ot
su
i
tedtot
h
eta
s
k
,be
c
a
us
e
f
i
n
a
nc
i
a
l
te
x
t
s h
a
v
e
a
sp
e
c
i
a
l
i
z
ed
l
a
ngu
a
g
e
w
i
t
h un
i
q
u
e
v
o
c
ab
u
l
a
r
y
and ha
v
e
a t
e
nd
e
nc
y
to us
e
v
a
g
u
e
e
x
p
r
e
ssions inst
e
ad of
e
asi
ly
-
i
d
e
n
t
i
f
i
e
d n
e
g
at
i
v
e
/
p
o
s
i
t
i
v
e
w
o
r
ds
.
Usin
g
ca
r
e
fu
lly
c
r
aft
e
d
f
inancia
l
s
e
ntim
e
nt
l
e
x
icons such as
L
o
ugh
r
a
n
a
n
d
Mc
D
o
n
a
l
d
(
2011
)
[
11
]
m
a
y
s
ee
m
a
s
o
l
u
t
i
o
n
be
c
a
us
e
t
h
e
y
i
nc
o
rp
o
r
ate
e
x
i
s
t
i
ng
f
i
n
a
nc
i
a
l
k
n
o
w
l
ed
g
e
i
n
to
te
x
t
u
a
l
a
n
a
ly
s
i
s
.
H
o
w
e
v
e
r
,
t
h
e
y
a
r
e
ba
s
ed
o
n
"
w
o
r
d
c
o
un
t
i
ng
"
m
et
h
od
s
,
wh
i
ch c
o
m
e
sh
o
r
t
i
n
a
n
a
lyz
i
ng d
ee
p
e
r
s
e
m
a
n
t
i
c m
ea
n
i
ng
o
f
a
g
i
v
e
n
te
x
t.
N
L
P
t
r
ansf
e
r
l
e
a
r
nin
g
m
e
thods
l
oo
k
l
i
k
e
a p
r
omisin
g
so
l
ution
to
b
oth of th
e
cha
ll
e
n
g
e
s m
e
ntion
e
d a
b
o
v
e,
and a
r
e
th
e
focus of
this th
e
sis
.
Th
e
co
r
e
id
e
a
be
hind th
e
s
e
mod
e
l
s is that
b
y
t
r
ain
-
in
g
l
an
g
ua
g
e
mod
e
l
s on
v
e
r
y
l
a
rg
e
co
r
po
r
a and th
e
n initia
l
i
z
in
g
d
o
wn
-
s
t
r
ea
m m
o
d
e
l
s w
i
t
h
t
h
e
w
e
i
g
h
t
s
l
ea
r
n
e
d f
r
o
m
t
h
e
l
a
n
g
u
a
g
e
mod
e
l
in
g
tas
k
,
a much
be
tt
e
r
p
e
r
fo
r
manc
e
can
be
achi
e
v
e
d
.
Th
e
i
n
i
t
i
a
l
i
z
ed
l
a
y
e
r
s c
a
n
r
a
ng
e
f
r
o
m
t
h
e
s
i
ng
l
e
w
o
r
d
e
m
bedd
i
ng
l
a
y
e
r
[
23
]
to
t
h
e
wh
o
l
e
m
ode
l
[
5
]
.
T
h
i
s
a
ppr
oa
ch sh
o
u
l
d,
i
n
t
h
eo
r
y
,
be
a
n
answ
e
r
t
o
t
h
e
sca
r
c
i
t
y
of
l
a
be
l
e
d da
t
a p
r
o
b
l
e
m
.
Lan
g
ua
g
e
mod
e
l
s
do
n
t
r
eq
u
i
r
e
a
n
y
l
abe
l
s
,
s
i
nc
e
t
h
e
ta
s
k
i
s pr
ed
i
c
t
i
ng
t
h
e
n
e
x
t
w
o
r
d.
Th
e
y
can
l
e
a
r
n ho
w
to
r
e
p
r
e
s
e
nt th
e
s
e
mantic info
r
mation
.
That
l
ea
v
e
s
t
h
e
f
i
n
e
-
t
un
i
ng
o
n
l
abe
l
ed
data
o
n
ly
t
h
e
ta
s
k
o
f
l
ea
rn
i
ng h
o
w
to
us
e
t
h
i
s s
e
m
a
n
t
i
c
i
nf
o
r
m
at
i
o
n
to
p
r
e
d
i
c
t
t
h
e
l
abe
l
s
.
On
e
p
a
r
t
i
cu
l
a
r c
o
mp
o
n
e
n
t
o
f
t
h
e
t
r
a
nsf
e
r
l
ea
rn
i
ng m
et
h
od
s
i
s
t
h
e
ab
i
l
i
t
y
to
fur
t
h
e
r pr
e
-
t
r
a
i
n
t
h
e
l
a
ngu
a
g
e
m
ode
l
s
o
n
do
m
a
i
n sp
e
c
i
f
i
c
un
l
abe
l
ed
c
o
rpus
.
T
hus
,
t
h
e
m
ode
l
c
a
n
l
ea
rn
t
h
e
s
e
m
a
n
t
i
c r
e
l
at
i
o
ns
in th
e
t
e
x
t of th
e
ta
rg
e
t domain
,
w
hich is
l
i
k
e
ly
to ha
v
e
a diff
e
r-
e
n
t
d
i
s
t
r
i
b
u
t
i
o
n
t
h
a
n
a
g
e
n
e
r
a
l
c
o
r
pus
.
Th
i
s
a
pp
r
oa
ch
i
s
e
sp
e
c
i
a
lly
p
r
o
m
i
s
i
ng f
o
r
a
n
i
ch
e
do
m
a
i
n
l
i
k
e
f
in
a
nc
e,
s
i
nc
e
t
h
e
l
a
ngu
a
g
e
a
n
d
v
o
c
ab
u
l
a
ry
us
e
d
i
s d
r
a
m
at
i
c
a
lly
d
i
ffe
r
e
n
t
t
h
a
n
a
g
e
n
e
r
a
l
o
n
e.
Th
e
g
oa
l
o
f
t
h
i
s
t
h
e
s
i
s
i
s
to
te
s
t
t
h
e
s
e
h
y
p
ot
h
e
s
i
z
e
d
a
d
v
a
n
ta
g
e
s
o
f us
i
ng
a
n
d
f
in
e
-
t
un
i
ng pr
e
-
t
r
a
i
n
ed
l
a
ngu
a
g
e
m
ode
l
s f
o
r
f
in
a
nc
i
a
l
domain
.
F
o
r
that
,
s
e
ntim
e
nt of a s
e
nt
e
nc
e
f
r
om a
f
inancia
l
n
e
w
s
a
r
t
i
c
l
e
to
w
a
r
d
s
t
h
e
f
in
a
nc
i
a
l
a
c
to
r
de
p
i
c
ted
i
n
t
h
e
s
e
n
te
nc
e
w
i
ll
be
t
r
i
e
d to
be
p
r
e
dict
e
d
,
usin
g
th
e
F
inancia
l
P
h
r
as
e
B
an
k
c
r
e
at
e
d
b
y
Ma
l
o
et
a
l
.
(
2014
)
[
17
]
and
F
i
QA Tas
k
1
s
e
n
t
i
m
e
n
t
sco
r
i
n
g
da
t
as
et
[
15
]
.
Th
e
m
a
i
n c
o
n
t
r
i
b
u
t
i
o
ns
o
f
t
h
i
s
t
h
e
s
i
s
a
r
e
t
h
e
f
o
ll
o
w
i
ng
:
W
e
i
n
t
r
od
uc
e
F
i
nB
E
R
T
,
wh
i
ch
i
s
a
l
a
ngu
a
g
e
m
ode
l
ba
s
ed
o
n
BER
T f
o
r
f
in
a
nc
i
a
l
N
LP
ta
s
k
s
.
W
e
e
v
a
l
u
ate
F
i
n
BER
T
o
n
t
w
o
f
in
a
nc
i
a
l
s
e
n
t
i
m
e
n
t
a
n
a
ly
s
i
s d
ata
s
et
s
.
W
e
a
ch
i
e
v
e
t
h
e
s
tate
-
of
-
t
h
e
-
a
r
t
on
F
i
QA s
e
n
t
i
m
e
n
t
sco
r
i
n
g
a
nd F
i
n
a
nc
i
a
l
Ph
r
a
s
e
B
a
n
k
.
W
e
i
mp
l
e
m
e
n
t
t
w
o
ot
h
e
r pr
e
-
t
r
a
i
n
ed
l
a
ngu
a
g
e
m
ode
l
s
,
UL
M
-
F
i
t
a
nd
E
LM
o
f
o
r
f
in
a
nc
i
a
l
s
e
n
t
i
m
e
n
t
a
n
a
ly
s
i
s
a
nd c
o
mp
a
r
e
t
h
e
s
e
w
i
t
h F
i
n
BER
T
.
W
e
conduct
e
x
p
e
r
im
e
nts to in
v
e
sti
g
at
e
s
e
v
e
r
a
l
asp
e
cts of
th
e
mod
e
l
,
inc
l
udin
g
:
e
ff
e
cts of fu
r
th
e
r
p
r
e
-
t
r
ainin
g
on
f
i
-
nancia
l
co
r
pus
,
t
r
ainin
g
st
r
at
e
g
i
e
s to p
r
e
v
e
nt catast
r
ophic
f
o
rg
ett
i
ng
a
n
d
f
in
e
-
t
un
i
ng
o
n
ly
a
sm
a
ll
su
b
s
et
o
f m
ode
l
l
a
y-
e
r
s fo
r
d
e
c
r
e
as
i
n
g
t
r
a
i
n
i
n
g
t
i
m
e
w
i
t
hou
t
a s
i
g
n
i
f
ican
t
d
r
op
i
n p
e
r
f
o
r
m
a
nc
e.
Th
e
r
e
s
t
o
f
t
h
e
t
h
e
s
i
s
i
s s
t
r
uc
t
u
r
e
d
a
s f
o
ll
o
ws
:
F
i
r
s
t,
r
e
l
e
v
a
n
t
li
t
-
e
r
at
ur
e
i
n
bot
h
f
i
n
a
nc
i
a
l
p
o
l
a
r
i
t
y
a
n
a
ly
s
i
s
a
n
d
pr
e
-
t
r
a
i
n
ed
l
a
ngu
a
g
e
mod
e
l
s a
r
e
discuss
e
d (
S
e
ction
2
)
.
Th
e
n
,
th
e
e
v
a
l
uat
e
d mod
e
l
s a
r
e
d
e
sc
r
i
be
d (
S
e
ction
3
)
.
This is fo
ll
o
w
e
d
b
y
th
e
d
e
sc
r
iption of th
e
e
x
p
e
r
i
m
e
n
ta
l
s
et
up
be
i
ng us
ed
(S
e
c
t
i
o
n
4
)
.
I
n S
e
c
t
i
o
n
5,
w
e
pr
e
s
e
n
t
1
t
h
e
e
x
p
e
r
i
m
e
n
ta
l
r
e
su
l
t
s
o
n
t
h
e
f
in
a
nc
i
a
l
s
e
n
t
i
m
e
n
t
data
s
et
s
.
Th
e
n
w
e
fur
t
h
e
r
a
n
a
ly
z
e
F
i
n
BER
T fr
o
m
d
i
ffe
r
e
n
t
p
e
rsp
e
c
t
i
v
e
s
i
n S
e
c
t
i
o
n
6.
F
i
n
a
lly
,
w
e
c
o
nc
l
ud
e
w
i
t
h S
e
c
t
i
o
n
7.
2RELATE
D
LITERAT
U
RE
Th
i
s s
e
c
t
i
on d
e
sc
r
i
be
s p
r
e
v
i
ous
r
e
s
e
a
r
ch conduc
te
d on s
e
n
t
i
m
e
n
t
ana
ly
sis in
f
inanc
e
(
2.1
) and t
e
x
t c
l
assi
f
ication usin
g
p
r
e
-
t
r
ain
e
d
l
a
ngu
a
g
e
m
o
d
e
l
s (
2.2
)
.
2.1
S
e
n
t
i
m
e
n
t
a
n
aly
s
i
s
i
n
fi
n
a
n
c
e
S
e
n
t
i
m
e
n
t
a
n
a
ly
s
i
s
i
s
t
h
e
ta
s
k
o
f
e
x
t
r
a
c
t
i
ng s
e
n
t
i
m
e
n
t
s
o
r
o
p
i
n
i
o
ns
of p
e
op
l
e
f
r
om
wr
itt
e
n
l
an
g
ua
g
e
[
10
]
.
W
e
can di
v
id
e
th
e
r
e
c
e
nt
effo
r
t
s
i
n
to
t
w
o
gr
o
ups
:
1
)
Ma
ch
i
n
e
l
ea
rn
i
ng m
et
h
od
s w
i
t
h f
eat
ur
e
s
e
x
t
r
act
e
d f
r
om t
e
x
t
w
ith
"
w
o
r
d countin
g
"
[
1,
19,
28,
30
]
,
2
)
D
ee
p
l
ea
r
n
i
n
g
m
et
hods
,
wh
e
r
e
te
x
t
i
s
r
e
p
r
e
s
e
n
te
d
b
y
a
s
e
qu
e
nc
e
of
e
m
-
bedd
i
ngs
[
2,
25,
32
]
.
Th
e
f
o
rm
e
r su
ffe
rs fr
o
m
i
n
ab
i
l
i
t
y
to
r
e
pr
e
s
e
n
t
t
h
e
s
e
m
a
n
t
i
c
i
nf
o
rm
at
i
o
n
t
h
at
r
e
su
l
t
s fr
o
m
a
p
a
r
t
i
cu
l
a
r s
eq
u
e
nc
e
o
f
wo
r
ds
,
wh
il
e
t
h
e
l
a
tte
r
i
s of
te
n d
ee
m
e
d as
t
oo
"
da
t
a
-
hun
gr
y
"
as
i
t
l
ea
r
ns
a
much h
i
gh
e
r
num
be
r
o
f p
a
r
a
m
ete
r
s
[
18
]
.
F
i
n
a
nc
i
a
l
s
e
n
t
i
m
e
n
t
a
n
a
ly
s
i
s
d
i
ffe
rs fr
o
m g
e
n
e
r
a
l
s
e
n
t
i
m
e
n
t
a
n
a
l-
y
s
i
s n
ot
o
n
ly
i
n
do
m
a
i
n
,
b
u
t
a
l
s
o
t
h
e
purp
o
s
e.
Th
e
purp
o
s
e
be
h
i
n
d
f
inanc
i
a
l
s
e
n
t
i
m
e
n
t
ana
ly
s
i
s
i
s usua
lly
g
u
e
ss
i
n
g
how
t
h
e
ma
rk
et
s
w
i
ll
r
ea
c
t
w
i
t
h
t
h
e
i
nf
o
rm
at
i
o
n pr
e
s
e
n
ted
i
n
t
h
e
te
x
t
[
9
]
.
L
o
ughr
a
n
a
nd Mc
D
o
n
a
l
d (
2016
) p
r
e
s
e
n
t
s
a
t
h
o
r
o
u
g
h su
rv
e
y
o
f
r
e
c
e
n
t
w
o
rk
s
o
n
f
in
a
nc
i
a
l
te
x
t
a
n
a
l
y
s
i
s u
t
ili
z
i
n
g
m
a
ch
i
n
e
l
ea
r
n
i
n
g
w
i
t
h
"
ba
g-
o
f
-
w
o
r
ds
"
app
r
oach o
r
l
e
x
icon
-
b
as
e
d m
e
thods
[
12
]
.
F
o
r
e
x
amp
l
e,
in
L
o
u
g
h
r
a
n
a
nd Mc
D
o
n
a
l
d (
2011
)
,
t
h
e
y
c
r
eate
a
d
i
c
t
i
o
n
a
ry
o
f
f
in
a
n
-
cia
l
t
e
r
ms
w
ith assi
g
n
e
d
v
a
l
u
e
s such as
"
positi
v
e
"
o
r
"
unc
e
r
tain
"
a
n
d
m
ea
sur
e
t
h
e
to
n
e
o
f
a
do
cum
e
n
t
s
b
y
c
o
un
t
i
ng w
o
r
d
s w
i
t
h
a
sp
e
-
c
i
f
ic
d
i
c
t
i
o
n
a
r
y
v
a
l
u
e
[
11
]
.
A
n
ot
h
e
r
e
x
a
mp
l
e
i
s P
a
g
o
l
u
et
a
l
.
(
2016
)
,
wh
e
r
e
n
-
gr
a
ms fr
o
m
t
w
eet
s w
i
t
h
f
i
n
a
nc
i
a
l
i
nf
o
rm
at
i
o
n
a
r
e
f
ed
i
n
to
sup
e
rv
is
e
d machin
e
l
e
a
r
nin
g
a
l
g
o
r
ithms to d
e
t
e
ct th
e
s
e
ntim
e
nt
r
e
g
a
r
d
i
ng
t
h
e
f
in
a
nc
i
a
l
e
n
t
i
t
y
m
e
n
t
i
o
n
e
d
.
O
n of
t
h
e
f
i
r
s
t
pap
e
r
s
t
ha
t
us
e
d d
ee
p
l
e
a
r
n
i
n
g
m
et
hods fo
r
te
x-
t
u
a
l
f
i
n
a
nc
i
a
l
p
o
l
a
r
i
t
y
a
n
a
ly
s
i
s w
a
s
K
r
a
us
a
n
d
F
e
u
e
rr
i
e
g
e
l
(
2017
)
[
7
]
.
T
h
e
y
a
pp
ly
a
n LS
T
M
n
e
ur
a
l
n
et
w
o
r
k
to
ad
-
h
o
c c
o
mp
a
n
y
a
nn
o
unc
e
-
m
e
n
t
s
to
p
r
e
d
i
c
t
s
to
c
k-
m
a
rk
et
m
o
v
e
m
e
n
t
s
a
nd sh
o
w
t
h
at
m
et
h
o
d
to
be
m
o
r
e
a
ccur
ate
t
h
a
n
t
r
ad
i
t
i
o
n
a
l
m
a
ch
i
n
e
l
ea
rn
i
ng
a
ppr
oa
ch
e
s
.
Th
e
y
f
ind p
r
e
-
t
r
a
i
n
i
n
g
t
h
e
i
r
mod
e
l
on a
l
a
rg
e
r
co
r
pus
t
o
i
mp
r
o
v
e
t
h
e
r
e
su
l
t,
h
o
w
e
v
e
r
t
h
e
i
r
p
r
e
-
t
r
a
i
n
i
ng
i
s d
o
n
e
o
n
a
l
abe
l
e
d d
ata
s
et,
w
hich is a mo
r
e
l
imitin
g
app
r
oach th
e
n ou
r
s
,
as
w
e
p
r
e
-
t
r
ain a
l
a
ngu
a
g
e
m
o
d
e
l
a
s
a
n unsup
e
rv
i
s
e
d
ta
s
k
.
Th
e
r
e
a
r
e
s
e
v
e
r
a
l
oth
e
r
w
o
rk
s that
e
mp
l
o
y
v
a
r
ious t
y
p
e
s of
n
e
ur
a
l
a
rch
i
te
c
t
ur
e
s f
o
r
f
i
n
a
nc
i
a
l
s
e
n
t
i
m
e
n
t
a
n
a
ly
s
i
s
.
S
o
h
a
ng
i
r
et
a
l
.
(
2018
)
[
26
]
app
ly
s
e
v
e
r
a
l
g
e
n
e
r
ic n
e
u
r
a
l
n
et
w
o
rk
a
r
chi
te
c
t
u
r
e
s
t
o
a
S
t
oc
k
T
w
i
t
s da
t
as
et,
f
indin
g
C
NN
as
t
h
e
be
s
t
p
e
r
fo
r
min
g
n
e
u
r
a
l
n
et
w
o
r
k
a
rch
i
te
c
t
ur
e
.
Lu
t
z
eta
l
.
2018
[
13
]
ta
k
et
h
ea
ppr
oa
ch
o
fus
i
ng
d
o
c2
v
e
c
to
g
e
n
e
r
ate
s
e
n
te
nc
e
e
m
bedd
i
ngs
i
n
a
p
a
r
t
i
cu
l
a
r
c
o
mp
a
n
y
ad
-
h
o
c
a
nn
o
unc
e
m
e
n
t
a
n
d
u
t
i
l
i
z
e
mu
l
t
i
-
i
ns
ta
nc
e
l
ea
rn
i
ng
to
pr
ed
i
c
t
s
to
c
k
m
a
r
k
et
o
u
t
c
o
m
e
s
.
Ma
i
a
et
a
l
.
(
2018
)
[
14
]
us
e
a
c
o
m
b
i
n
at
i
o
n
o
f
te
x
t
s
i
mp
l
i
f
i
c
at
i
o
n
a
n
d
LS
T
M n
et
w
o
r
k
to
c
l
a
ss
i
f
y
a
s
et
o
f s
e
n
te
nc
e
s
fr
o
m
f
i
n
a
nc
i
a
l
n
e
ws
a
cc
o
r
d
i
ng
to
t
h
e
i
r s
e
n
t
i
m
e
n
t
a
n
d
a
ch
i
e
v
e
s
tate
-
of
-
th
e
-
a
r
t
r
e
su
l
ts fo
r
th
e
F
inancia
l
P
h
r
as
e
B
an
k
,
w
hich is us
e
d in
t
h
e
s
i
s
a
s w
e
ll
.
D
u
e
to
l
ac
k
of
l
a
rg
e
l
a
be
l
e
d
f
inancia
l
datas
e
ts
,
it is dificu
l
t to
u
t
i
l
i
z
e
n
e
ur
a
l
n
et
w
o
r
k
s
to
t
h
e
i
r fu
ll
p
ote
n
t
i
a
l
f
o
r s
e
n
t
i
m
e
n
t
a
n
a
ly
s
i
s
.
Ev
e
n wh
e
n
t
h
e
i
r
f
irs
t
(w
o
r
d
e
m
bedd
i
ng)
l
a
y
e
rs
a
r
e
i
n
i
t
i
a
l
i
z
ed
w
i
t
h
pr
e
-
t
r
a
i
n
ed
v
a
l
u
e
s
,
t
h
e
r
e
s
t
o
f
t
h
e
m
ode
l
s
t
i
ll
n
eed
s
to
l
ea
rn c
o
mp
l
e
x
r
e
l
ations
w
ith
r
e
l
ati
v
e
ly
sma
ll
amount of
l
a
be
l
e
d data
.
A mo
r
e
p
r
omisin
g
so
l
ution cou
l
d
be
initia
l
i
z
in
g
a
l
most th
e
e
nti
r
e
mod
e
l
w
i
t
h p
r
e
-
t
r
a
i
n
e
d
v
a
l
u
e
s
a
nd
f
in
e
-
t
un
i
n
g
t
h
o
s
e
v
a
l
u
e
s w
i
t
h
r
e
sp
e
c
t
to
t
h
e
c
l
a
ss
i
f
ic
at
i
o
n
ta
s
k
.
2.2Tex
t
c
la
ss
ifi
c
a
t
io
n
us
i
n
g
p
re
-
t
rai
n
ed
la
n
gu
a
g
e
m
odel
s
L
a
ngu
a
g
e
m
ode
l
i
ng
i
s
t
h
e
ta
s
k
o
f pr
ed
i
c
t
i
ng
t
h
e
n
e
x
t
w
o
r
d
i
n
a
g
i
v
e
n
pi
e
c
e
of t
e
x
t
.
O
n
e
of th
e
most impo
r
tant
r
e
c
e
nt d
e
v
e
l
opm
e
nts in
n
at
ur
a
l
l
a
ngu
a
g
e
pr
o
c
e
ss
i
ng
i
s
t
h
e
r
ea
l
i
z
at
i
o
n
t
h
at
a
m
ode
l
t
r
a
i
n
ed
fo
r
l
an
g
ua
g
e
mod
e
l
in
g
can
be
succ
e
ssfu
lly
f
in
e
-
tun
e
d fo
r
most
do
w
n
-
st
r
e
am
N
L
P
tas
k
s
w
ith sma
ll
modi
f
ications
.
Th
e
s
e
mod
e
l
s
a
r
e
usua
ll
y
t
r
a
i
n
e
d on
v
e
ry
l
a
rg
e
co
r
po
r
a
,
and
t
h
e
n w
i
t
h add
i
t
i
on
o
f
su
i
tab
l
e
ta
s
k-
sp
e
c
i
f
ic
l
a
y
e
r
s
f
in
e
-
t
un
ed
o
n
t
h
e
ta
r
g
et
data
s
et
[
6
]
.
T
e
x
t
c
l
assi
f
ication
,
w
hich is th
e
focus of this th
e
sis
,
is on
e
of th
e
ob
v
i
o
us us
e
-
c
a
s
e
s f
o
r
t
h
i
s
a
pp
r
oa
ch
.
E
LM
o
(
E
m
bedd
i
ngs fr
o
m L
a
ngu
a
g
e
M
ode
l
s)
[
23
]
w
a
s
o
n
e
o
f
t
h
e
f
i
r
s
t
succ
e
ssfu
l
app
l
ica
t
ions of
t
his app
r
oach
.
W
i
t
h
E
LMo
,
a d
ee
p
b
i
d
i
r
e
c
t
i
o
n
a
l
l
a
n
g
u
a
g
e
m
o
d
e
l
i
s p
r
e
-
t
r
a
i
n
e
d
o
n
a
l
a
rg
e
c
o
r
pus
.
F
o
r
e
ach
w
o
r
d
,
hidd
e
n stat
e
s of this mod
e
l
is us
e
d to comput
e
a con
-
te
x
t
u
a
l
i
z
ed
r
e
pr
e
s
e
n
tat
i
o
n
.
Us
i
ng
t
h
e
pr
e
-
t
r
a
i
n
ed
w
e
i
gh
t
s
o
f
E
LM
o,
cont
e
x
tua
l
i
z
e
d
w
o
r
d
e
m
be
ddin
g
s can
be
ca
l
cu
l
at
e
d fo
r
an
y
pi
e
c
e
o
f
te
x
t.
I
n
i
t
i
a
li
z
i
n
g
e
m
be
dd
i
n
g
s f
o
r
d
o
wn
-
s
t
r
ea
m
ta
s
k
s w
i
t
h
t
h
o
s
e
w
e
r
e
sho
w
n to imp
r
o
v
e
p
e
r
fo
r
manc
e
on most tas
k
s compa
r
e
d to
s
tat
i
c w
o
r
d
e
m
bedd
i
ngs such
a
s w
o
r
d2
v
e
c
o
r
Gl
o
V
e
.
F
o
r
te
x
t
c
l
a
ss
i
-
f
ica
t
i
on
t
as
k
s
li
k
e
SS
T
-
5,
i
t
ach
i
e
v
e
d s
t
a
te
-
of
-
t
h
e
-
a
r
t
p
e
r
fo
r
manc
e
wh
e
n us
ed
to
g
et
h
e
r w
i
t
h
a
b
i
-
atte
n
t
i
v
e
c
l
a
ss
i
f
ic
at
i
o
n n
et
w
o
r
k
[
20
]
.
A
l
t
hou
g
h
E
LMo ma
k
e
s us
e
of p
r
e
-
t
r
a
i
n
e
d
l
an
g
ua
g
e
mod
e
l
s fo
r
c
o
n
te
x
t
u
a
l
i
z
i
ng
r
e
p
r
e
s
e
n
tat
i
o
ns
,
s
t
i
ll
t
h
e
i
nf
o
r
m
at
i
o
n
e
x
t
r
a
c
ted
us
-
i
ng
a
l
a
ngu
a
g
e
m
ode
l
i
s pr
e
s
e
n
t
o
n
ly
i
n
t
h
e
f
i
rs
t
l
a
y
e
r
o
f
a
n
y
m
ode
l
us
i
n
g
i
t.
ULMF
i
t
(Un
i
v
e
r
s
a
l
L
a
n
g
u
a
g
e
M
o
d
e
l
F
i
n
e
-
t
un
i
n
g
)
[
5
]
w
a
s
t
h
e
f
i
r
s
t
pap
e
r
t
o ach
i
e
v
e
t
r
u
e
t
r
ansf
e
r
l
e
a
r
n
i
n
g
fo
r
N
L
P
,
as us
i
n
g
no
v
e
l
t
e
chniqu
e
s such as disc
r
iminati
v
e
f
in
e
-
tunin
g
,
s
l
ant
e
d t
r
i
-
an
g
u
l
a
r
l
e
a
r
n
i
n
g
r
a
te
s and
gr
adua
l
unf
r
ee
z
i
n
g
.
Th
e
y
w
e
r
e
a
b
l
e
t
o
e
fici
e
nt
ly
f
in
e
-
tun
e
a
w
ho
l
e
p
r
e
-
t
r
ain
e
d
l
an
g
ua
g
e
mod
e
l
fo
r
t
e
x
t
c
l
a
ss
i
f
ic
at
i
o
n
.
Th
e
y
a
l
s
o
i
n
t
r
od
uc
ed
fur
t
h
e
r pr
e
-
t
r
a
i
n
i
ng
o
f
t
h
e
l
a
n
-
g
ua
g
e
mod
e
l
on a domain
-
sp
e
ci
f
ic co
r
pus
,
assumin
g
ta
rg
e
t tas
k
data com
e
s f
r
om a diff
e
r
e
nt dist
r
i
b
ution than th
e
g
e
n
e
r
a
l
co
r
pus
t
h
e
i
n
i
t
i
a
l
m
o
d
e
l
w
a
s
t
r
a
i
n
e
d
o
n
.
ULM
F
it
s main id
e
a of
e
fici
e
nt
ly
f
in
e
-
tunin
g
a p
r
e
-
t
r
ain
e
d a
l
a
ngu
a
g
e
m
ode
l
f
o
r
do
wn
-
s
t
r
ea
m
ta
s
k
s w
a
s
b
r
o
ugh
t
to
a
n
ot
h
e
r
l
e
v
e
l
w
ith
B
idi
r
e
ctiona
l
E
ncod
e
r
R
e
p
r
e
s
e
ntations f
r
om T
r
ansfo
r
m
e
r
s
(
B
E
R
T)
[
3
]
,
w
hich is a
l
so th
e
main focus of this pap
e
r
.
B
E
R
T has
t
w
o
i
mp
o
r
ta
n
t
d
i
ffe
r
e
nc
e
s fr
o
m wh
at
c
a
m
e
be
f
o
r
e:
1
)
I
t
def
in
e
s
t
h
e
ta
s
k
o
f
l
a
ngu
a
g
e
m
o
d
e
l
i
ng
a
s p
r
e
d
i
c
t
i
ng
r
a
nd
o
m
ly
m
a
s
k
e
d
to
k
e
ns in
a s
e
qu
e
nc
e
r
ath
e
r
than th
e
n
e
x
t to
k
e
n
,
in addition to a tas
k
of
c
l
assif
y
in
g
t
w
o s
e
nt
e
nc
e
s as fo
ll
o
w
in
g
e
ach oth
e
r
o
r
not
.
2
) It is a
v
e
r
y
b
i
g
n
e
t
w
o
rk
t
r
ain
e
d on an unp
r
e
c
e
d
e
nt
e
d
ly
l
a
rg
e
co
r
pus
.
Th
e
s
e
t
wo fac
t
o
r
s
e
na
b
l
e
d
i
n
t
o ach
i
e
v
e
s
t
a
te
-
of
-
t
h
e
-
a
r
t
r
e
su
l
t
s
i
n
mu
l
t
i
p
l
e
N
LP
ta
s
k
s such
a
s
,
n
at
ur
a
l
l
a
ngu
a
g
e
i
nf
e
r
e
nc
e
o
r
q
u
e
s
t
i
o
n
a
nsw
e
r
i
ng
.
Th
e
sp
e
c
i
f
ics
o
f
f
in
e
-
t
un
i
ng
BER
T f
o
r
te
x
t
c
l
a
ss
i
f
ic
at
i
o
n h
a
s n
ot
bee
n
r
e
s
e
a
r
ch
e
d tho
r
ou
g
h
ly
.
O
n
e
such
r
e
c
e
nt
w
o
rk
is
S
un
e
t a
l
.
2
(
2019
)
[
27
]
.
Th
e
y
c
o
nduc
t
a
s
e
r
i
e
s
o
f
e
x
p
e
r
i
m
e
n
t
s
r
e
g
a
r
d
i
ng d
i
ffe
r-
e
nt con
f
i
g
u
r
ations of
B
E
R
T fo
r
t
e
x
t c
l
assi
f
ication
.
S
om
e
of th
e
i
r
r
e
su
l
t
s w
i
ll
be
r
e
f
e
r
e
nc
ed
t
h
r
o
ugh
o
u
t
t
h
e
r
e
s
t
o
f
t
h
e
t
h
e
s
i
s
,
f
o
r
t
h
e
c
o
n
f
igu
r
at
i
o
n
o
f
o
u
r
m
o
d
e
l
.
3METH
O
D
I
n
t
h
i
s s
e
c
t
i
o
n
,
w
e
w
i
ll
pr
e
s
e
n
t
o
ur B
E
R
T
i
mp
l
e
m
e
n
tat
i
o
n f
o
r
f
i
n
a
n
-
c
i
a
l
do
m
a
i
n n
a
m
ed
a
s F
i
n
BER
T
,
a
f
te
r g
i
v
i
ng
a
b
r
i
e
f
ba
c
k
gr
o
un
d
o
n
r
e
l
e
v
a
n
t
n
e
u
r
a
l
a
r
ch
i
te
c
t
u
r
e
s
.
3.1
P
reli
m
i
n
arie
s
3.1.1
L
ST
M
.
Lon
g
sho
r
t
-
t
e
r
m m
e
mo
r
y
(L
S
TM) is a t
y
p
e
of
r
e
-
cu
rr
e
nt n
e
u
r
a
l
n
e
t
w
o
rk
that a
ll
o
w
s
l
on
g-
t
e
r
m d
e
p
e
nd
e
nci
e
s in a
s
e
qu
e
nc
e
to
p
e
r
s
i
s
t
i
n
t
h
e
n
et
w
o
rk
b
y
us
i
n
g
"
f
o
rg
et
"
a
nd
"
upd
ate
"
g
ate
s
.
I
t
i
s
o
n
eo
f
t
h
e
pr
i
m
a
r
y
a
rch
i
te
c
t
ur
e
sf
o
rm
ode
l
i
ng
a
n
y
s
eq
u
e
n
-
t
i
a
l
data
g
e
n
e
r
at
i
o
n pr
o
c
e
ss
,
fr
o
m s
to
c
k
pr
i
c
e
s
to
n
at
ur
a
l
l
a
ngu
a
g
e.
S
i
nc
e
a
te
x
t
i
s
a
s
eq
u
e
nc
e
o
f
to
k
e
ns
,
t
h
e
f
i
r
s
t
ch
o
i
c
e
f
o
r
a
n
y
LSTM
n
at
ur
a
l
l
a
ngu
a
g
e
pr
o
c
e
ss
i
ng m
ode
l
i
s
dete
rm
i
n
i
ng h
o
w
to
i
n
i
t
i
a
lly
r
e
p
r
e
s
e
nt a sin
g
l
e
to
k
e
n
.
Usin
g
p
r
e
-
t
r
ain
e
d
w
e
i
g
hts fo
r
initia
l
to
-
k
e
n r
e
pr
e
s
e
n
tat
i
o
n
i
s
t
h
e
c
o
mm
o
n pr
a
c
t
i
c
e.
On
e
such pr
e
-
t
r
a
i
n
i
ng
a
l
g
o
r
i
t
hm
i
s
G
L
o
V
e
(
Gl
oba
l
V
e
c
to
rs f
o
r W
o
r
d
R
e
pr
e
s
e
n
tat
i
o
n)
[
22
]
.
G
Lo
V
r
is a mod
e
l
fo
r
ca
l
cu
l
atin
g
w
o
r
d
r
e
p
r
e
s
e
ntations
w
ith th
e
unsup
e
rv
is
e
d tas
k
of t
r
ainin
g
a
l
o
g-
b
i
l
in
e
a
r
r
e
gr
e
ssion mod
e
l
on a
wo
r
d
-
wo
r
d co
-
occu
r
anc
e
ma
t
r
i
x
f
r
om a
l
a
rg
e
co
r
pus
.
I
t
i
s an
e
f
-
f
e
cti
v
e
mod
e
l
fo
r
r
e
p
r
e
s
e
ntin
g
w
o
r
ds in a
v
e
cto
r
spac
e,
ho
w
e
v
e
r
i
t
do
e
sn
t
con
te
x
t
ua
li
z
e
t
h
e
s
e
r
e
p
r
e
s
e
n
t
a
t
i
ons
w
i
t
h
r
e
sp
e
c
t
t
o
t
h
e
s
eq
u
e
nc
e
t
h
e
y
a
r
e
a
c
t
u
a
lly
us
e
d
i
n
1
.
3.1.2
E
L
M
o.
E
LM
o
e
m
bedd
i
ngs
[
23
]
a
r
e
c
o
n
te
x
t
u
a
l
i
z
ed
w
o
r
d
r
e
p
-
r
e
s
e
ntations in th
e
s
e
ns
e
that th
e
su
rr
oundin
g
w
o
r
ds in
f
l
u
e
nc
e
th
e
r
e
p
r
e
s
e
ntation of th
e
w
o
r
d
.
In th
e
c
e
nt
e
r
of
E
LMo
,
th
e
r
e
is a
b
idi
r
e
ctiona
l
l
an
g
ua
g
e
mod
e
l
w
ith mu
l
tip
l
e
L
S
TM
l
a
y
e
r
s
.
Th
e
g
oa
l
of a
l
an
g
ua
g
e
mod
e
l
is to
l
e
a
r
n th
e
p
r
o
b
a
b
i
l
it
y
dist
r
i
b
ution
o
v
e
r
s
eq
u
e
nc
e
s
o
f
to
k
e
ns
i
n
a
g
i
v
e
n
v
o
c
ab
u
l
a
ry
.
E
LM
o
m
ode
l
s
t
h
e
pr
obab
i
l
i
t
y
o
f
a
to
k
e
n g
i
v
e
n
t
h
e
pr
e
v
i
o
us (
a
n
d
s
e
p
a
r
ate
ly
f
o
ll
o
w
i
ng)
to
k
e
ns
i
n
t
h
e
s
eq
u
e
nc
e.
Th
e
n
t
h
e
m
ode
l
a
l
s
o
l
ea
rns h
o
w
to
w
e
i
gh
t
diff
e
r
e
nt
r
e
p
r
e
s
e
ntations f
r
om diff
e
r
e
nt L
S
TM
l
a
y
e
r
s in o
r
d
e
r
to
c
a
l
cu
l
ate
o
n
e
c
o
n
te
x
t
u
a
l
i
z
ed
v
e
c
to
r p
e
r
to
k
e
n
.
Onc
e
t
h
e
c
o
n
te
x
t
u
a
l-
i
z
e
d
r
e
p
r
e
s
e
ntations a
r
e
e
x
t
r
act
e
d
,
th
e
s
e
can
be
us
e
d to initia
l
i
z
e
a
n
y
d
o
wn
-
s
t
r
ea
m
N
LP
ta
s
k
2
.
3.1.3
U
L
M
F
i
t
.
UL
M
F
i
t
i
s
at
r
a
nsf
e
r
l
ea
rn
i
ngm
ode
l
f
o
r
do
wn
-
s
t
r
ea
m
N
LP
t
as
k
s
,
t
ha
t
ma
k
e
us
e
of
l
an
g
ua
g
e
mod
e
l
p
r
e
-
t
r
a
i
n
i
n
g
[
5
]
.
Un
-
li
k
e
E
LMo
,
w
i
t
h ULM
F
i
t,
t
h
e
who
l
e
l
an
g
ua
g
e
mod
e
l
i
s
f
in
e
-
t
un
e
d
to
g
e
th
e
r
w
ith th
e
tas
k-
sp
e
ci
f
ic
l
a
y
e
r
s
.
Th
e
und
e
r
ly
in
g
l
an
g
ua
g
e
mod
e
l
us
e
d in ULM
F
it is A
W
D
-
L
S
TM
,
w
hich us
e
s sophisticat
e
d
d
r
o
p
o
u
t
t
un
i
ng s
t
r
ate
g
i
e
s
to
bette
r r
e
gu
l
a
r
i
z
e
i
t
s LSTM m
ode
l
[
21
]
.
F
o
r
c
l
a
ss
i
f
ic
at
i
o
n us
i
n
g
ULMF
i
t
t
w
o
li
n
ea
r
l
a
y
e
r
s
a
r
e
a
dd
e
d
to
t
h
e
pr
e
-
t
r
a
i
n
ed
A
W
D-
LS
T
M,
f
i
rs
t
o
f wh
i
ch
ta
k
e
s
t
h
e
p
oo
l
ed
l
a
s
t
h
i
dde
n
s
tate
s
a
s
i
npu
t.
ULM
F
it com
e
s
w
ith no
v
e
l
t
r
ainin
g
st
r
at
e
g
i
e
s fo
r
fu
r
th
e
r
p
r
e
-
t
r
ainin
g
th
e
l
an
g
ua
g
e
mod
e
l
on domain
-
sp
e
ci
f
ic co
r
pus and
f
in
e
-
tunin
g
on th
e
do
w
n
-
st
r
e
am tas
k
.
W
e
imp
l
e
m
e
nt th
e
s
e
st
r
at
e
g
i
e
s
w
i
t
h F
i
n
BER
T
a
s
e
x
p
l
a
i
n
e
d
i
n s
e
c
t
i
o
n
3.2.
1
Th
e
p
r
e
-
t
r
ain
e
d
w
e
i
g
htsfo
rG
Lo
VE
canb
e
found
h
tt
ps
:
//
n
l
p
.
s
t
anfo
r
d
.e
du
/
p
r
o
j
e
c
t
s
/
g
l
o
v
e
/
2
Th
e
p
r
e
-
t
r
a
i
n
e
d
E
LMo mod
e
l
s can
be
found h
e
r
e:
h
tt
ps
:
//
a
ll
e
nn
l
p
.
o
r
g
/
e
l
mo
3.1.4
T
ran
s
f
o
r
me
r
.
Th
e
T
r
ansfo
r
m
e
r
i
s an a
tte
n
t
i
on
-
b
as
e
d a
r
ch
i
-
te
c
t
u
r
e
f
o
r
m
ode
l
i
ng s
eq
u
e
n
t
i
a
l
i
nf
o
r
m
at
i
o
n
,
t
h
at
i
s
a
n
a
l
te
r
n
at
i
v
e
to
r
e
curr
e
n
t
n
e
ur
a
l
n
et
w
o
r
k
s
[
29
]
.
I
t
w
a
s pr
o
p
o
s
ed
a
s
a
s
eq
u
e
nc
e
-
to
-
s
e
qu
e
nc
e
m
o
d
e
l
,
t
h
e
r
e
f
o
r
e
i
nc
l
ud
i
n
g
e
nc
o
d
e
r
a
nd d
e
c
o
d
e
r
m
e
ch
a
-
n
i
sms
.
H
e
r
e,
w
e
w
i
ll
f
o
cus
o
n
ly
o
n
t
h
e
e
nc
ode
r p
a
r
t
(
t
h
o
ugh
de
c
ode
r
i
s qu
i
te
s
i
m
il
a
r
)
.
Th
e
e
ncod
e
r
cons
i
s
t
s of mu
l
t
i
p
l
e
i
d
e
n
t
i
c
a
l
T
r
a
ns
-
fo
r
m
e
r
l
a
y
e
r
s
.
E
ach
l
a
y
e
r
has a mu
l
ti
-
h
e
ad
e
d s
e
l
f
-
att
e
ntion
l
a
y
e
r
a
n
d
a
fu
lly
c
o
nn
e
c
ted
f
eed
-
f
o
rw
a
r
d
n
et
w
o
r
k
.
F
o
r
o
n
e
s
e
l
f
-
atte
n
t
i
o
n
l
a
y
e
r
,
t
hr
ee
m
a
pp
i
ngs fr
o
m
e
m
bedd
i
ngs (
k
e
y
,
q
u
e
r
y
a
n
d
v
a
l
u
e
)
a
r
e
l
e
a
r
n
e
d
.
Usin
g
e
ach to
k
e
n
s
k
e
y
and a
ll
to
k
e
ns
qu
e
r
y
v
e
cto
r
s
,
a
simi
l
a
r
it
y
sco
r
e
is ca
l
cu
l
at
e
d
w
ith dot p
r
oduct
.
Th
e
s
e
sco
r
e
s a
r
e
us
ed
to
w
e
i
gh
t
t
h
e
v
a
l
u
e
v
e
c
to
rs
to
a
rr
i
v
e
at
t
h
e
n
e
w r
e
pr
e
s
e
n
tat
i
o
n
o
f
t
h
e
to
k
e
n
.
W
i
t
h
t
h
e
mu
l
t
i
-
h
eaded
s
e
l
f
-
atte
n
t
i
o
n
,
t
h
e
s
e
l
a
y
e
rs
a
r
e
c
o
nc
ate
n
ated
to
g
et
h
e
r
,
s
o
t
h
at
t
h
e
s
eq
u
e
nc
e
c
a
n
be
e
v
a
l
u
ated
fr
o
m
v
a
ry
i
ng
"
p
e
r
sp
e
c
t
i
v
e
s
"
.
Th
e
n
t
h
e
r
e
su
l
ted
v
e
c
to
r
s g
o
t
h
r
o
ugh fu
lly
c
o
nn
e
c
te
d n
et
w
o
rk
s w
i
t
h sh
a
r
e
d p
a
r
a
m
ete
r
s
.
A
s
i
t
w
a
s
a
rgu
ed
b
y
V
a
sw
a
n
i
2017
[
29
]
,
T
r
a
nsf
o
rm
e
r
a
rch
i
te
c
t
ur
e
has s
e
v
e
r
a
l
ad
v
an
t
a
g
e
s o
v
e
r
t
h
e
RNN-
b
as
e
d app
r
oach
e
s
.
B
e
caus
e
o
f
RNN
s’ s
eq
u
e
n
t
i
a
l
n
at
ur
e,
t
h
e
y
a
r
e
much h
a
r
de
r
to
p
a
r
a
ll
e
l
i
z
e
o
n
G
PUs
a
n
d
too
m
a
n
y
s
te
ps
bet
w
ee
n f
a
r
a
w
a
y
e
l
e
m
e
n
t
s
i
n
a
s
eq
u
e
nc
e
m
a
k
e
i
t
h
a
r
d f
o
r
i
nf
o
r
m
at
i
o
n
to
p
e
r
s
i
s
t.
3.1.5
B
ER
T
.
B
E
R
T
[
3
]
i
s
i
n
e
ss
e
nc
e
a
l
a
ngu
a
g
e
m
ode
l
t
h
at
c
o
ns
i
s
t
s of
a s
e
t of T
r
ansfo
r
m
e
r
e
ncod
e
r
s stac
k
e
d on top of
e
ach oth
e
r
.
H
o
w
e
v
e
r
it d
ef
in
e
s th
e
l
an
g
ua
g
e
mod
e
l
in
g
tas
k
diff
e
r
e
nt
ly
f
r
om
E
LM
o
a
n
d
A
WD-
LSTM
.
I
ns
tead
o
f p
r
ed
i
c
t
i
ng
t
h
e
n
e
x
t
w
o
r
d
g
i
v
e
n
pr
e
v
i
o
us
o
n
e
s
,
B
E
R
T
"
m
a
s
k
s
"
a
r
a
n
do
m
ly
s
e
l
e
c
ted
15%
o
f
a
ll
to
k
e
ns
.
W
ith a softma
x
l
a
y
e
r
o
v
e
r
v
oca
b
u
l
a
r
y
on top of th
e
l
ast
e
ncod
e
r
l
a
y
e
r
th
e
mas
k
e
d to
k
e
ns a
r
e
p
r
e
dict
e
d
.
A s
e
cond tas
k
B
E
R
T is
t
r
a
i
n
e
d
o
n
i
s
"
n
e
x
t
s
e
n
te
nc
e
p
r
e
d
i
c
t
i
o
n
"
.
G
i
v
e
n
t
w
o
s
e
n
te
nc
e
s
,
t
h
e
m
ode
l
pr
ed
i
c
t
s wh
et
h
e
r
o
r n
ot
t
h
e
s
e
t
w
o
a
c
t
u
a
lly
f
o
ll
o
w
ea
ch
ot
h
e
r
.
Th
e
i
npu
t
s
eq
u
e
nc
e
i
s
r
e
p
r
e
s
e
n
te
d w
i
t
h
to
k
e
n
a
nd p
o
s
i
t
i
o
n
e
m
-
bedd
i
ngs
.
T
w
o
to
k
e
ns
de
n
oted
b
y
[
C
LS
]
a
n
d
[
S
E
P
]
a
r
e
added
to
t
h
e
be
g
i
nn
i
n
g
and
e
nd of
t
h
e
s
e
qu
e
nc
e
r
e
sp
e
c
t
i
v
e
ly
.
F
o
r
a
ll
c
l
ass
i
f
ica
-
t
i
on
t
as
k
s
,
i
nc
l
ud
i
n
g
t
h
e
n
e
x
t
s
e
n
te
nc
e
p
r
e
d
i
c
t
i
on
,
[
CL
S
]
t
o
k
e
n
i
s
us
e
d
.
BER
T h
a
s
t
w
o
v
e
rs
i
o
ns
:
BER
T
-
ba
s
e,
w
i
t
h
12
e
nc
ode
r
l
a
y
e
rs
,
h
i
d
-
de
n s
i
z
e
o
f
768,
12
mu
l
t
i
-
h
ead
atte
n
t
i
o
n h
ead
s
a
n
d
110M
p
a
r
a
m
ete
rs
in tota
l
and
B
E
R
T
-
l
a
rg
e,
w
ith
24
e
ncod
e
r
l
a
y
e
r
s
,
hidd
e
n si
z
e
of
1024,
16
mu
l
t
i
-
h
ead
atte
n
t
i
o
n h
ead
s
a
n
d
340
M p
a
r
a
m
ete
rs
.
B
ot
h
o
f
t
h
e
s
e
mod
e
l
s ha
v
e
bee
n
t
r
a
i
n
e
d on
B
oo
k
Co
r
pus
[
33
]
and
E
n
g
li
sh
W
i
k
i
p
e
d
i
a,
wh
i
ch h
a
v
e
i
n
tota
l
m
o
r
e
t
h
a
n
3,500
M w
o
r
ds
3
.
3.2
B
ERT for fi
n
a
n
c
ial do
m
ai
n
:
F
i
nB
ERT
I
n
t
h
i
s su
b
s
e
c
t
i
o
n w
e
w
i
ll
de
scr
i
be
o
ur
i
mp
l
e
m
e
n
tat
i
o
n
o
f B
E
R
T
:
1
)
ho
w
fu
r
t
h
e
r
p
r
e
-
t
r
ainin
g
on domain co
r
pus is don
e,
2
-
3
) ho
w
w
e
i
mp
l
e
m
e
n
ted
BER
T f
o
r c
l
a
ss
i
f
ic
at
i
o
n
a
n
d
r
e
gr
e
ss
i
o
n
ta
s
k
s
,
4
)
t
r
a
i
n
-
i
n
g
s
t
r
a
te
g
i
e
s w
e
us
e
d du
r
i
n
g
f
in
e
-
t
un
i
n
g
t
o p
r
e
v
e
n
t
ca
t
as
t
r
oph
i
c
f
o
r
g
ett
i
ng
.
3.2.1
Fur
t
h
e
r pr
e
-
t
ra
i
n
i
ng
.
H
o
w
a
r
d and
R
ud
e
r
(
2018
)
[
5
]
sho
w
s
t
h
at
fu
t
h
e
r pr
e
-
t
r
a
i
n
i
ng
a
l
a
ngu
a
g
e
m
ode
l
o
n
a
ta
rg
et
do
m
a
i
n c
o
rpus
i
mpr
o
v
e
s
t
h
e
e
v
e
n
t
u
a
l
c
l
a
ss
i
f
ic
at
i
o
n p
e
rf
o
rm
a
nc
e.
F
o
r
BER
T
,
t
h
e
r
e
is
not d
e
cisi
v
e
r
e
s
e
a
r
ch sho
w
in
g
that
w
ou
l
d
be
th
e
cas
e
as
w
e
ll
.
h
e
r
e:
3
Th
e
pr
e
-
t
r
a
i
n
ed
w
e
i
gh
t
s
a
r
e
m
ade
pu
b
l
i
c
b
y
cr
eato
rs
o
f
BER
T
.
Th
e
c
ode
a
n
d
w
e
i
gh
t
s
can
be
found h
e
r
e:
h
tt
ps
:
//
g
i
t
hu
b.
com
/
goog
l
e
-r
e
s
e
a
r
ch
/
be
r
t
3
R
e
g
a
r
d
l
e
ss
,
w
e
i
mp
l
e
m
e
n
t
fur
t
h
e
r pr
e
-
t
r
a
i
n
i
ng
i
n
o
r
de
r
to
ob
s
e
r
v
e
i
f such
a
d
a
p
tat
i
o
n
i
s g
o
i
ng
to
be
be
n
ef
ic
i
a
l
f
o
r
f
in
a
nc
i
a
l
d
o
m
a
i
n
.
F
o
r
fu
r
th
e
r
p
r
e
-
t
r
ainin
g
,
w
e
e
x
p
e
r
im
e
nt
w
ith t
w
o app
r
oach
e
s
.
T
h
e
f
i
rs
t
i
s pr
e
-
t
r
a
i
n
i
ng
t
h
e
m
ode
l
o
n
a
r
e
l
at
i
v
e
ly
l
a
rg
e
c
o
rpus fr
o
m
t
h
e
ta
r
g
et
do
m
a
i
n
.
F
o
r
t
h
at,
w
e
fu
r
t
h
e
r
p
r
e
-
t
r
a
i
n
a
BER
T
l
a
ngu
a
g
e
m
ode
l
o
n
a
f
in
a
nc
i
a
l
c
o
r
pus (
deta
i
l
s
o
f
t
h
e
c
o
r
pus c
a
n
be
f
o
un
d
o
n
s
e
c
t
i
o
n
4.2.1
)
.
Th
e
s
e
c
o
n
d
a
ppr
oa
ch
i
s pr
e
-
t
r
a
i
n
i
ng
t
h
e
m
ode
l
o
n
ly
o
n
t
h
e
s
e
n
te
nc
e
s fr
o
m
t
h
e
t
r
a
i
n
i
ng c
l
a
ss
i
f
ic
at
i
o
n
data
s
et
.
A
l
t
h
o
ugh
t
h
e
s
e
c
o
n
d
c
o
rpus
i
s much sm
a
ll
e
r
,
us
i
ng
data
fr
o
m
t
h
e
d
i
r
e
c
t
ta
rg
et
m
i
gh
t
p
r
o
v
i
d
e
bette
r
ta
r
g
et
d
o
m
a
i
n
a
d
a
p
tat
i
o
n
.
3.2.2
F
i
n
BERT
f
o
r
te
x
t
c
l
a
ss
i
f
i
c
a
t
i
o
n
.
S
e
ntim
e
nt c
l
assi
f
ication is
c
o
n
d
uc
ted
b
y
add
i
ng
a
de
ns
e
l
a
y
e
r
a
f
te
r
t
h
e
l
a
s
t
h
i
dde
n s
tate
o
f
t
h
e
[
C
LS
]
to
k
e
n
.
Th
i
s
i
s
t
h
e
r
e
c
o
mm
e
n
ded
pr
a
c
t
i
c
e
f
o
r us
i
ng
BER
T f
o
r
a
n
y
c
l
a
ss
i
f
i
c
at
i
o
n
ta
s
k
[
3
]
.
T
h
e
n
,
t
h
e
c
l
a
ss
i
f
i
e
r n
et
w
o
r
k
i
s
t
r
a
i
n
ed
o
n
t
h
e
l
abe
l
ed
s
e
n
t
i
m
e
n
t
data
s
et
.
A
n
o
v
e
r
v
i
e
w
o
f
a
ll
t
h
e
s
te
ps
i
n
v
o
lv
ed
i
n
t
h
e
p
r
o
c
e
du
r
e
i
s p
r
e
s
e
n
te
d
o
n
f
igu
r
e
1.
3.2.3
F
i
n
B
ER
T
f
o
r r
e
g
r
e
ss
i
on
.
Wh
i
l
e
t
h
e
f
o
cus
o
f
t
h
i
s p
a
p
e
r
i
s c
l
a
s
-
si
f
ication
,
w
e
a
l
so imp
l
e
m
e
nt
r
e
gr
e
ssion
w
ith a
l
most th
e
sam
e
a
r
chit
e
ctu
r
e
on a diff
e
r
e
nt datas
e
t
w
ith continuous ta
rg
e
ts
.
Th
e
o
n
ly
d
i
ffe
r
e
nc
e
i
s
t
h
at
t
h
e
l
o
ss func
t
i
o
n
be
i
ng us
ed
i
s m
ea
n s
q
u
a
r
ed
e
rr
o
r
i
ns
tea
d
o
f
t
h
e
c
r
o
ss
e
n
t
r
o
p
y
l
o
ss
.
3.2.4
T
ra
i
n
i
ng
st
ra
te
g
i
es
to
pr
eve
n
t
c
a
t
a
st
r
o
ph
i
c
f
o
rg
ett
i
ng
.
As it
w
as point
e
d out
b
y
H
o
w
a
r
d and
R
ud
e
r
(
2018
)
[
5
]
,
catast
r
ophic
fo
rg
e
ttin
g
is a si
g
ni
f
icant dan
g
e
r
w
ith this
f
in
e
-
tunin
g
app
r
oach
.
B
e
caus
e
th
e
f
in
e
-
tunin
g
p
r
oc
e
du
r
e
can quic
k
ly
caus
e
mod
e
l
to
"
f
o
rg
et
"
t
h
e
i
nf
o
rm
at
i
o
n fr
o
m
l
a
ngu
a
g
e
m
ode
l
i
ng
ta
s
k
a
s
i
t
t
r
i
e
s
to
a
d
a
p
t
to
t
h
e
n
e
w
ta
s
k
.
I
n
o
r
d
e
r
to
d
ea
l
w
i
t
h
t
h
i
s ph
e
n
o
m
e
n
o
n
,
w
e
app
ly
th
r
ee
t
e
chniqu
e
s as it
w
as p
r
opos
e
d
b
y
H
o
w
a
r
d and
R
ud
e
r
(
2018
)
:
s
l
a
n
ted
t
r
i
a
ngu
l
a
r
l
ea
rn
i
ng r
ate
s
,
d
i
scr
i
m
i
n
at
i
v
e
f
i
n
e
-
t
un
i
ng
a
nd g
r
a
du
a
l
unf
r
ee
z
i
ng
.
S
l
a
n
ted
t
r
i
a
ngu
l
a
r
l
ea
rn
i
ng r
ate
a
pp
l
i
e
s
a
l
ea
rn
i
ng r
ate
sch
ed
u
l
e
i
n
t
h
e
sh
a
p
e
o
f
a
s
l
a
n
ted
t
r
i
a
ngu
l
a
r
,
t
h
at
i
s
,
l
ea
rn
i
ng r
ate
f
i
rs
t
l
i
n
ea
r
ly
i
nc
r
ea
s
e
s up
to
s
o
m
e
p
o
i
n
t
a
nd
a
f
te
r
t
h
at
p
o
i
n
t
l
i
n
ea
rly
d
e
c
r
ea
s
e
s
.
D
i
scr
i
m
i
n
at
i
v
e
f
i
n
e
-
t
un
i
ng
i
s us
i
ng
l
o
w
e
r
l
ea
rn
i
ng r
ate
s f
o
r
l
o
w
e
r
l
a
y
e
rs
o
n
t
h
e
n
et
w
o
r
k
.
A
ssum
e
o
ur
l
ea
rn
i
ng r
ate
at
l
a
y
e
r
l
i
s
α
.
T
h
e
n
f
o
r
d
i
scr
i
m
i
n
at
i
o
n r
ate
o
f
θ
w
e
c
a
l
cu
l
ate
t
h
e
l
ea
rn
i
ng r
ate
f
o
r
l
a
y
e
r
l
1
a
s
α
l−1
=
θ
α
l
.
T
h
e
a
ssump
t
i
o
n
be
h
i
n
d
t
h
i
s m
et
h
od
i
s
t
h
at
t
h
e
l
o
w
e
r
l
a
y
e
r
s
r
e
p
r
e
s
e
n
t
t
h
e
dee
p
-l
e
v
e
l
l
a
ngu
a
g
e
i
nf
o
r
m
at
i
o
n
,
wh
i
l
e
th
e
upp
e
r
on
e
s inc
l
ud
e
info
r
mation fo
r
actua
l
c
l
assi
f
ication tas
k
.
Th
e
r
e
f
o
r
e
w
e
f
in
e
-
t
un
e
t
h
e
m d
i
ffe
r
e
n
t
ly
.
W
i
t
h
gr
adua
l
f
r
ee
z
i
n
g
,
w
e
s
t
a
r
t
t
r
a
i
n
i
n
g
w
i
t
h a
ll
l
a
y
e
r
s
b
u
t
t
h
e
c
l
a
ss
i
f
i
e
r
l
a
y
e
r
a
s fr
o
z
e
n
.
D
ur
i
ng
t
r
a
i
n
i
ng w
e
gr
ad
u
a
lly
unfr
ee
z
e
a
ll
o
f
t
h
e
l
a
y
e
r
s s
ta
r
t
i
n
g
f
r
o
m
t
h
e
h
i
g
h
e
s
t
o
n
e,
s
o
t
h
at
t
h
e
l
o
w
e
r
l
e
v
e
l
f
eat
ur
e
s
be
c
o
m
e
t
h
e
l
ea
s
t
f
i
n
e
-
t
un
ed
o
n
e
s
.
H
e
nc
e,
d
ur
i
ng
t
h
e
i
n
i
t
i
a
l
sta
g
e
s of t
r
ainin
g
it is p
r
e
v
e
nt
e
d fo
r
mod
e
l
to
"
fo
rg
e
t
"
l
o
w-
l
e
v
e
l
l
a
ngu
a
g
e
i
nf
o
r
m
at
i
o
n
t
h
at
i
t
l
ea
r
n
e
d f
r
o
m p
r
e
-
t
r
a
i
n
i
ng
.
4EX
P
ERIME
N
TAL
S
ET
U
P
4.1 Re
s
ear
c
h
Qu
e
st
io
n
s
W
e
a
i
m
to
a
nsw
e
r
t
h
e
f
o
ll
o
w
i
ng
r
e
s
ea
r
ch
q
u
e
s
t
i
o
ns
:
(
R
Q
1
) Wh
at
i
s
t
h
e
p
e
rf
o
rm
a
nc
e
o
f F
i
nB
E
R
T
i
n sh
o
r
t
s
e
n
te
nc
e
c
l
a
ss
i
-
f
ic
at
i
o
n c
o
mp
a
r
ed
w
i
t
h
t
h
e
ot
h
e
r
t
r
a
nsf
e
r
l
ea
rn
i
ng m
et
h
od
s
l
i
k
e
E
LM
o
a
nd ULMF
i
t?
Table 1
:
D
i
st
rib
tut
io
n
of
s
e
n
t
i
m
e
n
t
label
s
a
n
d
a
g
ree
m
e
n
t
le
v-
el
s
i
n
F
i
n
a
n
c
ial
Ph
ra
s
e
B
a
n
k
A
g
r
ee
m
e
n
t
l
e
v
e
l
P
o
s
i
t
i
v
e
N
e
g
at
i
v
e
N
e
u
t
r
a
l
C
o
un
t
100%
%25.2
%13.4
%61.4
2262
75%
-
99%
%26.6
%9.8
%63.6
1191
66%
-
74%
%36.7
%12.3
%50.9
765
50%
-
65%
%31.1
%14.4
%54.5
627
A
ll
%28.1%12.4%59.44845
(
R
Q
2
)
H
o
w
doe
s F
i
n
BER
T c
o
mp
a
r
e
to
t
h
e
s
tate
-
o
f
-
t
h
e
-
a
r
t
i
n
f
in
a
n
-
c
i
a
l
s
e
n
t
i
m
e
n
t
a
n
a
ly
s
i
s w
i
t
h
ta
rg
et
s
d
i
scr
ete
o
r c
o
n
t
i
nu
o
us
?
(
R
Q
3
)
H
o
w
doe
s fu
t
h
e
r pr
e
-
t
r
a
i
n
i
ng
BER
T
o
n
f
in
a
nc
i
a
l
do
m
a
i
n
,
o
r
ta
r
g
et
c
o
r
pus
,
affe
c
t
t
h
e
c
l
a
ss
i
f
ic
at
i
o
n p
e
r
f
o
r
m
a
nc
e?
(
R
Q
4
) Wh
at
a
r
e
t
h
e
effe
c
t
s
o
f
t
r
a
i
n
i
ng s
t
r
ate
g
i
e
s
l
i
k
e
s
l
a
n
ted
t
r
i
a
n
-
gu
l
a
r
l
ea
rn
i
ng r
ate
s
,
d
i
scr
i
m
i
n
at
i
v
e
f
in
e
-
t
un
i
ng
a
n
d
gr
ad
u
a
l
unf
r
ee
z
i
n
g
o
n c
l
a
ss
i
f
ic
at
i
o
n p
e
r
f
o
r
m
a
nc
e?
D
o
t
h
e
y
p
r
e
v
e
n
t
c
ata
s
t
r
o
ph
i
c f
o
r
g
ett
i
ng
?
(
R
Q
5
)
W
h
i
ch
e
nc
o
d
e
r
l
a
y
e
r
p
e
r
f
o
r
ms
be
s
t
(
o
r
w
o
r
s
e
) f
o
r
s
e
n
te
nc
e
c
l
a
ss
i
f
ic
at
i
o
n
?
(
R
Q
6
)
H
o
w much
f
i
n
e
-
t
un
i
ng
i
s
e
n
o
ugh
?
T
h
at
i
s
,
a
f
te
r pr
e
-
t
r
a
i
n
i
ng
,
h
o
w m
a
n
y
l
a
y
e
rs sh
o
u
l
d
be
f
i
n
e
-
t
un
ed
to
a
ch
i
e
v
e
c
o
mp
a
r
ab
l
e
p
e
r
f
o
r
m
a
nc
e
to
f
in
e
-
t
un
i
ng
t
h
e
wh
o
l
e
m
o
d
e
l
?
4.2
D
a
t
a
s
e
ts
4.2.1
TR
C
2
-
f
i
nan
c
i
a
l
.
I
n o
r
d
e
r
t
o fu
r
t
h
e
r
p
r
e
-
t
r
a
i
n
BER
T
,
w
e
us
e
a
f
in
a
nc
i
a
l
c
o
r
pus w
e
c
a
ll
T
R
C
2
-
f
in
a
nc
i
a
l
.
I
t
i
s
a
su
b
s
et
o
f
R
e
u
te
r
s
T
R
C
2
4
,
w
hich consis
t
s of
1.8
M n
e
w
s a
r
t
ic
l
e
s
t
ha
t
w
e
r
e
pu
b
l
ish
e
d
b
y
R
e
ut
e
r
s
be
t
w
ee
n
2008
and
2010.
W
e
f
i
l
t
e
r
fo
r
som
e
f
inancia
l
k
e
y
w
o
r
d
s
i
n
o
r
de
r
to
m
a
k
e
c
o
rpus m
o
r
e
r
e
l
e
v
a
n
t
a
n
d
i
n
l
i
m
i
t
s w
i
t
h
t
h
e
c
o
mpu
te
p
o
w
e
r
a
v
a
i
l
ab
l
e
.
T
h
e
r
e
su
l
t
i
ng c
o
rpus
,
T
R
C
2
-
f
i
n
a
nc
i
a
l
,
i
nc
l
u
de
s
46,143
do
cum
e
n
t
s w
i
t
h m
o
r
e
t
h
a
n
29
M w
o
r
d
s
a
n
d
n
ea
rly
400
K s
e
n
te
nc
e
s
.
4.2.2
F
i
n
a
nc
i
a
l
Phra
s
e
B
a
n
k
.
Th
e
m
a
i
n s
e
n
t
i
m
e
n
t
a
n
a
ly
s
i
s
data
s
et
us
e
d
i
n
t
h
i
s pap
e
r
i
s
F
i
nanc
i
a
l
Ph
r
as
e
B
an
k
5
f
r
om Ma
l
o
et
a
l
.
2014
[
17
]
.
F
inancia
l
P
h
r
as
eb
an
k
consists of
4845
e
n
g
l
ish s
e
nt
e
nc
e
s s
e
-
l
e
c
ted
r
a
n
do
m
ly
fr
o
m
f
i
n
a
nc
i
a
l
n
e
ws f
o
un
d
o
n L
e
x
i
sN
e
x
i
s
databa
s
e
.
T
h
e
s
e
s
e
n
te
nc
e
s
t
h
e
n w
e
r
e
a
nn
otated
b
y
16
p
eo
p
l
e
w
i
t
h
ba
c
k
gr
o
un
d
i
n
f
in
a
nc
e
a
nd
b
us
i
n
e
ss
.
Th
e
a
nn
otato
r
s w
e
r
e
a
s
k
e
d
to
g
i
v
e
l
abe
l
s
a
cc
o
r
d
i
ng
to
h
o
w
t
h
e
y
t
h
i
n
k
t
h
e
i
nf
o
rm
at
i
o
n
i
n
t
h
e
s
e
n
te
nc
e
m
i
gh
t
affe
c
t
t
h
e
m
e
n
t
i
o
n
ed
c
o
mp
a
n
y
s
to
c
k
pr
i
c
e
.
T
h
e
data
s
et
a
l
s
o
i
nc
l
u
de
s
info
r
mation
r
e
g
a
r
din
g
th
e
a
gr
ee
m
e
nt
l
e
v
e
l
s on s
e
nt
e
nc
e
s amon
g
annotato
r
s
.
Th
e
dist
r
i
b
ution of a
gr
ee
m
e
nt
l
e
v
e
l
s and s
e
ntim
e
nt
l
a
be
l
s can
be
s
ee
n on
t
a
b
l
e
1.
W
e
s
et
as
i
d
e
20%
of a
ll
s
e
n
te
nc
e
s as
te
s
t
a
n
d
20%
o
f
t
h
e
r
e
m
a
i
n
i
ng
a
s
v
a
l
i
dat
i
o
n s
et
.
I
n
t
h
e
e
n
d,
o
ur
t
r
a
i
n
s
et
i
nc
l
ud
e
s
3101
e
x
a
mp
l
e
s
.
F
o
r
som
e
of
t
h
e
e
x
p
e
r
i
m
e
n
t
s
,
w
e
a
l
so
m
a
k
e
us
e
o
f
10
-
f
o
l
d c
r
o
ss
v
a
l
i
d
at
i
o
n
.
4
Th
e
co
r
pus can b
e
obtain
e
d fo
r
r
e
s
e
a
r
ch pu
r
pos
e
s b
y
app
ly
in
g
h
e
r
e:
h
tt
ps
:
//
t
r
e
c
.
n
i
s
t.
go
v
/
da
t
a
/
r
e
u
te
r
s
/
r
e
u
te
r
s
.
h
t
m
l
5
Th
e
d
ata
s
et
c
a
n
be
f
o
und h
e
r
e:
h
tt
ps
:
//
www
.
r
e
s
ea
rchg
ate.
n
et
/
pu
b
l
i
c
at
i
o
n
/
251231364
_F
i
nanc
i
a
l
Ph
r
as
e
B
an
k-v
10
4
Reuters TRC2-
financial
[CLS]Token 1 Token 2[MASK][SEP]
[CLS]Token 1 Token 2[MASK][SEP]
[CLS]Token 1 Token 2[MASK][SEP]
[CLS]Token 1 Token 2[MASK][SEP]
Dense
Masked LM prediction
Dense
[is next sentence] prediction
BookCorpus +
Wikipedia
Token 1 Token 2[MASK][SEP]
Embeddings [CLS]
Token 1 Token 2[MASK][SEP]
Token 1 Token 2[MASK][SEP]
Token 1 Token 2[MASK][SEP]
Encoder 1
[CLS]
Encoder 2
[CLS]
Encoder 12
[CLS]
Dense
Masked LM prediction
Dense
[is next sentence] prediction
Language model on general corpus
[CLS]Token 1 Token 2Token k[SEP]
[CLS]Token 1 Token 2Token k[SEP]
[CLS]Token 1 Token 2Token k[SEP]
[CLS]Token 1 Token 2Token k[SEP]
Dense
Sentiment prediction
Classification model on financial sentiment dataset
Financial
Phrasebank
Language model on financial corpus
F
i
gu
re 1
:
O
v
er
v
ie
w
of
p
re
-
t
rai
n
i
n
g
, f
u
r
t
h
er
p
re
-
t
rai
n
i
n
g
a
n
d
c
la
ss
ifi
c
a
t
io
n
fi
n
e
-
tu
n
i
n
g
4.2.3
F
i
Q
A
S
e
nt
i
m
e
nt.
F
i
Q
A
[
15
]
i
s
a
d
ata
s
et
t
h
at
w
a
s c
r
eate
d f
o
r
WWW
18
conf
e
r
e
nc
e
f
inancia
l
opinion minin
g
and qu
e
stion an
-
s
w
e
r
in
g
cha
ll
e
n
g
e
6
.
W
e
us
e
th
e
data fo
r
Tas
k
1,
w
hich inc
l
ud
e
s
1,174
f
i
n
a
nc
i
a
l
n
e
ws h
ead
l
i
n
e
s
a
n
d
t
w
eet
s w
i
t
h
t
h
e
i
r c
o
rr
e
sp
o
n
d
i
ng
s
e
n
t
i
m
e
n
t
sco
r
e.
Un
li
k
e
F
i
n
a
nc
i
a
l
Ph
r
a
s
eba
n
k
,
t
h
e
ta
rg
et
s fo
r
t
h
i
s
da
t
as
et
s a
r
e
con
t
i
nuous
r
an
g
i
n
g
bet
w
ee
n
[
1
,
1
]
w
i
t
h
1
be
i
n
g
t
h
e
m
o
s
t
p
o
s
i
t
i
v
e
.
E
a
ch
e
x
a
mp
l
e
a
l
s
o
h
a
s
i
nf
o
rm
at
i
o
n r
e
g
a
r
d
i
ng wh
i
ch
f
inancia
l
e
ntit
y
is ta
rg
e
t
e
d in th
e
s
e
nt
e
nc
e.
W
e
do
10
-
fo
l
d c
r
oss
v
a
l
i
d
at
i
o
n f
o
r
e
v
a
l
u
at
i
o
n
o
f
t
h
e
m
o
d
e
l
f
o
r
t
h
i
s d
ata
s
et.
4.3
B
a
s
eli
n
e Me
t
h
od
s
F
o
r
c
o
n
t
r
a
s
t
i
v
e
e
x
p
e
r
i
m
e
n
t
s
,
w
e
c
o
ns
i
d
e
r
ba
s
e
li
n
e
s w
i
t
h
t
h
r
ee
d
i
f
-
f
e
r
e
nt m
e
thods
:
L
S
TM c
l
assi
f
i
e
r
w
ith
G
Lo
V
e
e
m
be
ddin
g
s
,
L
S
TM
c
l
ass
i
f
i
e
r
w
i
t
h
E
LMo
e
m
be
dd
i
n
g
s and ULM
F
i
t
c
l
ass
i
f
i
e
r
.
I
t
shou
l
d
be
n
oted
t
h
at
t
h
e
s
e
ba
s
e
l
i
n
e
m
et
h
od
s
a
r
e
n
ot
e
x
p
e
r
i
m
e
n
ted
w
i
t
h
a
s
t
h
o
r
o
ugh
ly
a
s w
e
d
i
d
w
i
t
h
BER
T
.
Th
e
r
e
f
o
r
e
t
h
e
r
e
su
l
t
s sh
o
u
l
d
n
ot
be
i
n
te
rpr
eted
a
s
def
i
n
i
t
i
v
e
c
o
nc
l
us
i
o
ns
o
f
o
n
e
m
et
h
od
be
i
ng
bette
r
.
4.3.1
L
S
T
M
c
l
a
ss
i
f
i
e
r
s
.
W
e
i
mp
l
e
m
e
n
t
t
w
o
c
l
a
ss
i
f
i
e
rs us
i
ng
b
i
d
i
r
e
c
-
t
i
o
n
a
l
LSTM m
o
d
e
l
s
.
I
n
bot
h
o
f
t
h
e
m
,
a
h
i
dd
e
n s
i
z
e
o
f
128
i
s us
e
d
,
w
ith th
e
l
ast hidd
e
n stat
e
si
z
e
be
in
g
256
du
e
to
b
idi
r
e
ctiona
l
it
y
.
A fu
lly
conn
e
ct
e
d f
ee
d
-
fo
rw
a
r
d
l
a
y
e
r
maps th
e
l
ast hidd
e
n stat
e
to a
v
e
cto
r
of th
r
ee,
r
e
p
r
e
s
e
ntin
g
l
i
k
e
l
ihood of th
r
ee
l
a
be
l
s
.
Th
e
d
i
ffe
r
e
nc
e
bet
w
ee
n
t
w
o
m
ode
l
s
i
s
t
h
at
o
n
e
us
e
s
G
L
o
V
e
e
m
bedd
i
ngs
,
wh
il
e
t
h
e
o
t
h
e
r
us
e
s
E
LMo
e
m
be
dd
i
n
g
s
.
A d
r
opou
t
p
r
o
b
a
b
ili
t
y
of
0
.
3
a
n
d
a
l
ea
rn
i
ng r
ate
o
f
3e
-
5
i
s us
ed
i
n
bot
h m
ode
l
s
.
W
e
t
r
a
i
n
t
h
e
m
un
t
i
l
t
h
e
r
e
i
s n
o
i
mp
r
o
v
e
m
e
n
t
i
n
v
a
l
i
d
at
i
o
n
l
o
ss f
o
r
10
e
p
o
chs
.
4.3.2
ULMF
i
t.
As
i
t
was
e
x
p
l
a
i
n
e
d
i
n s
e
c
t
i
on
3.1.3,
c
l
ass
i
f
ica
t
i
on
w
i
t
h ULM
F
i
t
cons
i
s
t
s of
t
h
r
ee
s
te
ps
.
Th
e
f
i
r
s
t
s
te
p of p
r
e
-
t
r
a
i
n
i
n
g
a
l
a
n
g
u
a
g
e
m
o
d
e
l
i
s
a
l
r
ea
d
y
d
o
n
e
a
nd
t
h
e
p
r
e
-
t
r
a
i
n
e
d w
e
i
g
h
t
s
a
r
e
r
e
l
e
as
e
d
b
y
H
o
w
a
r
d and
R
ud
e
r
(
2018
)
.
W
e
f
i
r
st fu
r
th
e
r
p
r
e
-
t
r
ain
A
W
D-
LS
T
M
l
a
ngu
a
g
e
m
ode
l
o
n
T
R
C
2
-
f
i
n
a
nc
i
a
l
c
o
rpus f
o
r
3
e
p
o
chs
.
Aft
e
r
that
,
w
e
f
in
e
-
tun
e
th
e
mod
e
l
fo
r
c
l
assi
f
ication on
F
inancia
l
6
D
a
t
a can
be
found h
e
r
e:
h
tt
ps
:
//
s
i
te
s
.
goog
l
e.
com
/
v
i
e
w
/
f
iqa
/
hom
e
Phr
a
s
e
B
a
n
k
data
s
et,
b
y
add
i
ng
a
fu
lly-
c
o
nn
e
c
ted
l
a
y
e
r
to
t
h
e
o
u
t
pu
t
o
f p
r
e
-
t
r
a
i
n
e
d
l
a
ngu
a
g
e
m
o
d
e
l
.
4.4E
v
al
u
a
t
io
n
Me
t
ri
cs
F
o
r
e
v
a
l
uation of c
l
assi
f
ication mod
e
l
s
,
w
e
us
e
th
r
ee
m
e
t
r
ics
:
Ac
-
cu
r
a
c
y
,
c
r
o
ss
e
n
t
r
o
p
y
l
o
ss
a
nd m
a
c
r
o
F
1
a
v
e
r
a
g
e.
W
e
w
e
i
gh
t
c
r
o
ss
e
n
t
r
o
p
y
l
o
ss w
i
t
h s
q
u
a
r
e
r
oot
o
f
i
n
v
e
r
s
e
f
r
eq
u
e
nc
y
r
ate.
F
o
r
e
x
a
m
-
p
l
e
i
f
a
l
abe
l
c
o
ns
t
i
t
u
te
s
25%
o
f
t
h
e
a
ll
e
x
a
mp
l
e
s
,
w
e
w
e
i
gh
t
t
h
e
l
o
ss
att
r
i
b
u
ted
to
t
h
at
l
abe
l
b
y
2.
M
a
cr
o
F
1
a
v
e
r
a
g
e
c
a
l
cu
l
ate
s F
1
sc
o
r
e
s
f
o
r
ea
ch
o
f
t
h
e
c
l
a
ss
e
s
a
n
d
t
h
e
n
ta
k
e
s
t
h
e
a
v
e
r
a
g
e
o
f
t
h
e
m
.
S
i
nc
e
o
ur
data
,
F
inancia
l
P
h
r
as
e
B
an
k
suff
e
r
s f
r
om
l
a
be
l
im
b
a
l
anc
e
(a
l
most
60%
o
f
a
ll
s
e
n
te
nc
e
s
a
r
e
n
e
u
t
r
a
l
)
,
t
h
i
s g
i
v
e
s
a
n
ot
h
e
r g
ood
m
ea
sur
e
o
f
t
h
e
c
l
a
ss
i
f
ic
at
i
o
n p
e
rf
o
rm
a
nc
e.
F
o
r
e
v
a
l
u
at
i
o
n
o
f r
e
gr
e
ss
i
o
n m
ode
l
,
w
e
r
e
po
r
t m
e
an squa
r
e
d
e
rr
o
r
and R
2
,
as th
e
s
e
a
r
e
b
oth standa
r
d
a
nd
a
l
s
o
r
e
p
o
r
te
d
b
y
t
h
e
s
tate
-
o
f
-
t
h
e
-
a
r
t
p
a
p
e
r
s f
o
r
F
i
Q
A
d
ata
s
et.
4.5I
mp
le
m
e
n
t
a
t
io
n
D
e
t
ail
s
F
o
r
ou
r
imp
l
e
m
e
ntation
B
E
R
T
,
w
e
us
e
a d
r
opout p
r
o
b
a
b
i
l
it
y
of
p
=
0
.
1,
w
a
r
m
-
up p
r
opo
r
tion of
0
.
2,
ma
x
imum s
e
qu
e
nc
e
l
e
n
g
th of
64
to
k
e
ns
,
a
l
e
a
r
nin
g
r
at
e
of
2
e
5
and a mini
-
b
atch si
z
e
of
64.
W
e
t
r
ain th
e
mod
e
l
fo
r
6
e
pochs
,
e
v
a
l
uat
e
on th
e
v
a
l
idation s
et
a
nd
ch
oo
s
e
t
h
e
be
s
t
o
n
e.
F
o
r
d
i
sc
r
i
m
i
n
at
i
v
e
f
in
e
-
t
un
i
n
g
w
e
s
et
th
e
disc
r
imination
r
at
e
as
0.85.
W
e
sta
r
t t
r
ainin
g
w
ith on
ly
th
e
c
l
a
ss
i
f
i
c
at
i
o
n
l
a
y
e
r unfr
o
z
e
n
,
a
f
te
r
ea
ch
t
h
i
r
d
o
f
a
t
r
a
i
n
i
ng
e
p
o
ch w
e
unf
r
ee
z
e
t
h
e
n
e
x
t
l
a
y
e
r
.
An Ama
z
on p
2.
x
l
a
rg
e
E
C
2
i
ns
t
anc
e
w
i
t
h
o
n
e
N
V
I
D
IA
K
80
G
PU
,
4
v
C
PUs
a
n
d
64
G
i
B
o
f h
o
s
t
m
e
m
o
r
y
i
s us
ed
to
t
r
a
i
n
t
h
e
m
o
d
e
l
s
.
5EX
P
ERIME
N
TAL RE
SU
LT
S
(R
Q
1
&
R
Q
2)
Th
e
r
e
su
l
t
s
o
f F
i
n
BER
T
,
t
h
e
ba
s
e
li
n
e
m
et
h
o
ds
a
nd s
tate
-
o
f
-
t
h
e
-
a
r
t
o
n F
i
n
a
nc
i
a
l
Phr
a
s
e
B
a
n
k
data
s
et
c
l
a
ss
i
f
ic
at
i
o
n
ta
s
k
c
a
n
be
s
ee
n
o
n
tab
l
e
2
.
W
e
pr
e
s
e
n
t
t
h
e
r
e
su
l
t
o
n
bot
h
t
h
e
wh
o
l
e
data
s
et
a
n
d
su
b
s
et
w
i
t
h
100%
a
nn
otato
r
a
g
r
ee
m
e
n
t.
5
Table 2
:
Ex
p
eri
m
e
n
t
al Re
su
l
ts
o
n
t
h
e
F
i
n
a
n
c
ial
Ph
ra
s
e
B
a
n
k
d
a
t
a
s
e
t
A
ll
d
ata
D
ata
w
i
t
h
100%
a
g
r
ee
m
e
n
t
M
o
d
e
l
L
o
ss
A
ccu
r
a
c
y
F
1
Sc
o
r
e
L
o
ss
A
ccu
r
a
c
y
F
1
Sc
o
r
e
LSTM
0.81
LSTM w
i
t
h
E
LM
o0.72
ULMF
i
t0.41
0.71
0.75
0.83
0.64
0.57
0.7
0.50
0.79
0.20
0.81
0.84
0.93
0.74
0.77
0.91
LPS
-
H
SC
-
F
i
nSSLX
-
0.71
0.71
-
0.71
-
0.76
-
-
-
0.79
0.83
0.91
0.80
0.86
0.88
F
i
n
BER
T0.37
0.86
0.840.13
0.97
0.95
B
ol
d
fa
c
e
i
nd
i
c
ate
s
be
s
t
r
e
su
l
t
i
n
t
h
e
c
o
rr
e
sp
o
nd
i
ng m
et
r
i
c
.
LPS
[
17
]
,
H
S
C
[
8
]
a
nd F
i
nSSLX
[
15
]
r
e
su
l
t
s
a
r
e
ta
k
e
n fr
o
m
t
h
e
i
r r
e
sp
e
c
t
i
v
e
p
a
p
e
rs
.
F
o
r LPS
a
nd
H
SC
,
o
v
e
r
a
ll
a
ccur
a
c
y
i
s n
ot
r
e
p
o
r
te
d
o
n
t
h
e
p
a
p
e
rs
.
W
e
c
a
l
cu
l
ate
d
t
h
e
m us
i
ng r
e
c
a
ll
sc
o
r
e
s r
e
p
o
r
te
d f
o
r d
i
ffe
r
e
n
t
c
l
a
ss
e
s
.
Fo
r
t
h
e
mod
e
l
s
i
mp
l
e
m
e
n
te
d
b
y
us
,
w
e
r
e
po
r
t
10
-
fo
l
d c
r
oss
v
a
l
i
da
t
i
on
r
e
su
l
t
s
.
F
o
r
a
ll
of
t
h
e
m
e
asu
r
e
d m
et
r
ics
,
F
in
B
E
R
T p
e
r
fo
r
ms c
l
e
a
r
ly
t
h
e
be
s
ta
m
o
ng
bot
h
t
h
e
m
et
h
od
sw
e
i
mp
l
e
m
e
n
tedo
urs
e
lv
e
s(LS
T
Ma
n
d
UL
M
F
i
t
)
a
n
dt
h
e
m
ode
l
sr
e
p
o
r
tedb
y
ot
h
e
rp
a
p
e
rs(LPS
[
17
]
,
H
S
C
[
8
]
,
F
i
nSSLX
[
14
]
)
.
LS
T
M
c
l
a
ss
i
f
i
e
r w
i
t
h n
o
l
a
ngu
a
g
e
m
ode
l
i
nf
o
rm
at
i
o
n
p
e
rf
o
rms
t
h
e
w
o
rs
t
.
I
n
te
rms
o
f
a
ccur
a
c
y
,
i
t
i
s c
l
o
s
e
to
LPS
a
n
d
H
S
C
,
(
e
v
e
n
bette
r
t
h
a
n LPS f
o
r
e
x
a
mp
l
e
s w
i
t
h fu
ll
a
g
r
ee
m
e
n
t
)
,
h
o
w
e
v
e
r
i
t
pr
od
uc
e
s
a
l
o
w F
1
-
sc
o
r
e
.
T
h
at
i
s
d
u
e
to
i
t
p
e
rf
o
rm
i
ng much
bette
r
i
n
n
e
u
t
r
a
l
c
l
a
ss
.
LSTM c
l
a
ss
i
f
i
e
r
w
i
t
h
E
LM
o
e
m
bedd
i
ngs
i
mp
r
o
v
e
s
up
o
n LSTM w
i
t
h s
tat
i
c
e
m
bedd
i
ngs
i
n
a
ll
o
f
t
h
e
m
ea
sur
ed
m
et
r
i
cs
.
I
t
s
t
i
ll
su
ffe
r
s f
r
o
m
l
o
w
a
v
e
r
a
g
e
F
1
-
sc
o
r
e
d
u
e
to
p
oo
r
p
e
r
f
o
r
m
a
nc
e
i
n
l
e
ss r
e
pr
e
s
e
n
ted
l
abe
l
s
.
B
u
t
i
t
s p
e
rf
o
rm
a
nc
e
i
s c
o
mp
a
r
ab
l
e
w
i
t
h L
PS
and
H
S
C
,
be
stin
g
th
e
m in accu
r
ac
y
.
S
o cont
e
x
tua
l
i
z
e
d
w
o
r
d
e
m
bedd
i
ngs pr
od
uc
e
c
l
o
s
e
p
e
rf
o
rm
a
nc
e
to
m
a
ch
i
n
e
l
ea
rn
i
ng
ba
s
ed
m
et
h
o
ds f
o
r
d
ata
s
et
o
f
t
h
i
s s
i
z
e.
UL
M
F
i
t
s
i
gn
i
f
i
c
a
n
t
ly
i
mpr
o
v
e
s
o
n
a
ll
o
f
t
h
e
m
et
r
i
cs
a
n
d
i
t
doe
sn’
t
suff
e
r
f
r
om mod
e
l
p
e
r
fo
r
min
g
much
be
tt
e
r
in som
e
c
l
ass
e
s than
t
h
e
ot
h
e
rs
.
I
t
a
l
s
o
h
a
n
d
i
ly
beat
s
t
h
e
m
a
ch
i
n
e
l
ea
rn
i
ng
ba
s
ed
m
ode
l
s
LPS
a
n
d
H
S
C
.
Th
i
s sh
o
ws
t
h
e
effe
c
t
i
v
e
n
e
ss
o
f
l
a
ngu
a
g
e
m
ode
l
pr
e
-
t
r
a
i
n
i
ng
.
A
W
D-
LS
T
M
i
s
a
v
e
r
y
l
a
rg
e
m
ode
l
a
n
d
i
t
w
o
u
l
d
be
e
x
p
e
c
ted
t
o suff
e
r
f
r
om o
v
e
r-
f
i
tt
i
n
g
w
i
t
h
t
h
i
s sma
ll
of a da
t
as
et.
B
u
t
du
e
t
o
l
an
g
ua
g
e
mod
e
l
p
r
e
-
t
r
ainin
g
and
e
ff
e
cti
v
e
t
r
ainin
g
st
r
at
e
g
i
e
s
,
it
i
s
ab
l
e
to
o
v
e
r
c
o
m
e
sm
a
ll
d
ata
p
r
ob
l
e
m
.
ULMF
i
t
a
l
s
o
o
u
t
p
e
r
f
o
r
ms
F
i
nSSLX
,
wh
i
ch h
a
s
a
te
x
t
s
i
mp
l
i
f
i
c
at
i
o
n s
te
p
a
s w
e
ll
a
s pr
e
-
t
r
a
i
n
i
ng
of
w
o
r
d
e
m
be
ddin
g
s on a
l
a
rg
e
f
inancia
l
co
r
pus
w
ith s
e
ntim
e
nt
l
abe
l
s
.
F
i
nB
E
R
T
o
u
t
p
e
rf
o
rms UL
M
F
i
t,
a
n
d
c
o
ns
eq
u
e
n
t
ly
a
ll
o
f
t
h
e
ot
h
e
r
m
et
h
od
s
i
n
a
ll
m
et
r
i
cs
.
I
n
o
r
de
r
to
m
ea
sur
e
t
h
e
p
e
rf
o
rm
a
nc
e
o
f
t
h
e
m
ode
l
s
o
n
d
i
ffe
r
e
n
t
s
i
z
e
s
o
f
l
abe
l
ed
t
r
a
i
n
i
ng
data
s
et
s
,
w
e
r
a
n LS
T
M
c
l
a
ss
i
f
i
e
rs
,
ULMF
i
t
a
n
d
F
i
n
BER
T
o
n
5
d
i
ffe
r
e
n
t
c
o
n
f
igur
at
i
o
ns
.
Th
e
r
e
su
l
t can
be
s
ee
n on
f
i
g
u
r
e
2,
w
h
e
r
e
th
e
c
r
oss
e
nt
r
op
y
l
oss
e
s on
te
s
t
s
et
f
o
r
ea
ch m
ode
l
a
r
e
d
r
a
wn
.
100
t
r
a
i
n
i
ng
e
x
a
mp
l
e
s
i
s
too
l
o
w f
o
r
a
ll
o
f
t
h
e
m
ode
l
s
.
H
o
w
e
v
e
r
,
o
nc
e
t
h
e
t
r
a
i
n
i
ng s
i
z
e
be
c
o
m
e
s
250,
ULM
F
i
t
and
F
in
B
E
R
T s
t
a
r
t
s
t
o succ
e
ssfu
lly
diff
e
r
e
n
t
ia
te
bet
w
ee
n
l
a
be
l
s
,
w
ith an accu
r
ac
y
as hi
g
h as
80%
fo
r
F
in
B
E
R
T
.
A
ll
of th
e
m
e
thods consist
e
nt
ly
g
e
t
be
tt
e
r
w
ith mo
r
e
data
,
b
ut ULM
F
it and
F
i
n
B
E
R
T do
e
s
bette
r
w
i
t
h
250
e
x
amp
l
e
s
t
han L
S
TM c
l
ass
i
f
i
e
r
s do
w
i
t
h
t
h
e
w
ho
l
e
da
t
as
et.
This sho
w
s
t
h
e
e
ff
e
c
t
i
v
e
n
e
ss of
l
an
g
ua
g
e
m
o
d
e
l
p
r
e
-
t
r
a
i
n
i
ng
.
F
i
gu
re 2
:
Te
st
lo
ss
d
i
ff
ere
n
t
t
rai
n
i
n
g
s
e
t
s
i
z
e
s
Th
e
r
e
su
l
t
s f
o
r F
i
Q
A
s
e
n
t
i
m
e
n
t
data
s
et,
a
r
e
pr
e
s
e
n
ted
o
n
tab
l
e
3
.
Ou
r
m
o
d
e
l
o
u
t
p
e
r
f
o
r
ms s
tate
-
o
f
-
t
h
e
-
a
r
t
m
o
d
e
l
s f
o
r
bot
h MS
E
a
nd
R
2
.
It shou
l
d
be
not
e
d that th
e
t
e
st s
e
t th
e
s
e
t
w
o pap
e
r
s
[
31
]
[
24
]
us
e
i
s
t
h
e
o
fic
i
a
l
F
i
QA T
a
s
k
1
te
s
t
s
et.
S
i
nc
e
w
e
d
o
n
t
h
a
v
e
a
cc
e
ss
t
o
t
ha
t
w
e
r
e
po
r
t
t
h
e
r
e
su
l
t
s on
10
-F
o
l
d c
r
oss
v
a
li
da
t
i
on
.
Th
e
r
e
i
s n
o
i
n
d
i
c
at
i
o
n
o
n
[
15
]
t
h
at
t
h
e
t
r
a
i
n
a
n
d
te
s
t
s
et
s
t
h
e
y
pu
b
l
i
sh c
o
m
e
f
r
om
diff
e
r
e
nt dist
r
i
b
utions and ou
r
mod
e
l
can
be
int
e
r
p
r
e
t
e
d to
be
a
t
d
i
sad
v
an
t
a
g
e
s
i
nc
e
w
e
n
ee
d
t
o s
et
as
i
d
e
a su
b
s
et
of
t
r
a
i
n
i
n
g
s
e
t as
t
e
st s
e
t
,
w
hi
l
e
stat
e
-
of
-
th
e
-
a
r
t pap
e
r
s can us
e
th
e
comp
l
e
t
e
t
r
a
i
n
i
ng
s
et.
6EX
P
ERIME
N
TAL A
N
ALY
S
I
S
6.1E
ff
e
cts
of f
u
r
t
h
er
p
re
-
t
rai
n
i
n
g
(R
Q
3)
W
e
f
i
r
st m
e
asu
r
e
th
e
e
ff
e
ct of fu
r
th
e
r
p
r
e
-
t
r
ainin
g
on th
e
p
e
r
fo
r-
manc
e
of th
e
c
l
assi
f
i
e
r
.
W
e
compa
r
e
th
r
ee
mod
e
l
s
:
1
)
N
o fu
r
th
e
r
p
r
e
-
t
r
ainin
g
(d
e
not
e
d
b
y
V
ani
ll
a
B
E
R
T)
,
2
)
F
u
r
th
e
r
p
r
e
-
t
r
ainin
g
on c
l
assi
f
ication t
r
ainin
g
s
e
t (d
e
not
e
d
b
y
F
in
B
E
R
T
-
tas
k
)
,
3
)
F
u
r-
th
e
r
p
r
e
-
t
r
ainin
g
on domain co
r
pus
,
T
R
C
2
-
f
inancia
l
(d
e
not
e
d
b
y
F
in
B
E
R
T
-
domain)
.
Mod
e
l
s a
r
e
e
v
a
l
uat
e
d
w
ith
l
oss
,
accu
r
ac
y
and
6
Table 3
:
Ex
p
eri
m
e
n
t
al Re
su
l
ts
o
n
F
i
Q
A
S
e
n
t
i
-
m
e
n
t
D
a
t
a
s
e
t
M
o
d
e
l
MS
E
R
2
Y
a
ng
et.
a
l
.
(
2018
)
P
i
ao
a
nd
Br
e
s
l
i
n (
2018
)
0.08
0.40
0.09
0.41
F
i
n
BER
T
0.070.55
B
ol
d
fa
c
e indicat
e
d b
e
st
r
e
su
l
t in co
rr
e
spondin
g
m
e
t
r
ic
.
Y
an
g
e
t
.
a
l
.
(
2018
)
[
31
]
and
P
iao and
Br
e
s
l
in (
2018
)
[
24
]
r
e
po
r
t
r
e
su
l
ts on th
e
oficia
l
t
e
st s
e
t
.
S
inc
e
w
e
don
t ha
v
e
acc
e
ss to that s
e
t ou
r
M
S
E
,
and
R
2
a
r
e
ca
l
cu
l
at
e
d w
i
th
10
-
Fo
l
d c
r
oss
v
a
l
i
da
t
i
on
.
w
i
t
h
d
i
ff
ere
n
t
p
re
-
Table 4
:
P
erfor
m
a
n
c
e
t
rai
n
i
n
g
st
ra
t
e
g
ie
s
M
o
d
e
l
L
o
ss
A
ccu
r
a
c
y
F
1
Sc
o
r
e
V
a
n
i
ll
a
BER
T
0.380.850.84
F
i
n
BER
T
-
ta
s
k
0.390.86
0.85
F
i
n
BER
T
-
d
o
m
a
i
n0.370.86
0.84
B
ol
d
fa
c
e
i
n
d
i
c
ate
s
be
s
t
r
e
su
l
t
i
n
t
h
e
c
o
rr
e
sp
o
n
d
i
ng m
et
-
r
i
c
.
R
e
su
l
t
s a
r
e
r
e
po
r
te
d on
10
-
fo
l
d c
r
oss
v
a
l
i
da
t
i
on
.
m
a
cr
o
a
v
e
r
a
g
e
F
1
sc
o
r
e
s
o
n
t
h
e
te
s
t
data
s
et
.
T
h
e
r
e
su
l
t
s c
a
n
be
s
ee
n
o
n
tab
l
e
4.
Th
e
c
l
a
ss
i
f
i
e
r
t
h
at
w
e
r
e
fur
t
h
e
r pr
e
-
t
r
a
i
n
ed
o
n
f
in
a
nc
i
a
l
do
m
a
i
n
c
o
rpus p
e
rf
o
rms
be
s
t
a
m
o
ng
t
h
e
t
hr
ee,
t
h
o
ugh
t
h
e
d
i
ffe
r
e
nc
e
i
s n
ot
v
e
r
y
h
i
g
h
.
Th
e
r
e
m
i
g
h
t
be
fou
r
r
e
asons
be
h
i
nd
t
h
i
s
r
e
su
l
t:
1
) Th
e
c
o
rpus m
i
gh
t
h
a
v
e
a
d
i
ffe
r
e
n
t
d
i
s
t
r
i
b
u
t
i
o
n
t
h
a
n
t
h
e
ta
s
k
s
et,
2
) B
E
R
T
c
l
a
ss
i
f
i
e
rs m
i
gh
t
n
ot
i
mpr
o
v
e
s
i
gn
i
f
i
c
a
n
t
ly
w
i
t
h fur
t
h
e
r pr
e
-
t
r
a
i
n
i
ng
,
3
) Sh
o
r
t
s
e
n
te
nc
e
c
l
a
ss
i
f
i
c
at
i
o
n m
i
gh
t
n
ot
be
n
ef
i
t
s
i
gn
i
f
i
c
a
n
t
ly
fr
o
m
fur
t
h
e
r pr
e
-
t
r
a
i
n
i
ng
,
4
) P
e
rf
o
rm
a
nc
e
i
s
a
l
r
ead
y
s
o
g
ood,
t
h
at
t
h
e
r
e
i
s
n
ot
much r
oo
m f
o
r
i
mpr
o
v
e
m
e
n
t
.
W
e
t
h
i
n
k
t
h
at
t
h
e
l
a
s
t
e
x
p
l
a
n
at
i
o
n
i
s
t
h
e
l
i
k
e
l
i
e
s
t,
be
c
a
us
e
f
o
r
t
h
e
su
b
s
et
o
f F
i
n
a
nc
i
a
l
Phr
a
s
eba
n
k
t
h
at
a
ll
o
f
t
h
e
a
nn
otato
rs
a
gr
ee
o
n
t
h
e
r
e
su
l
t,
a
ccur
a
c
y
o
f
V
a
n
i
ll
a
BER
T is
a
l
r
e
ad
y
0.96.
Th
e
p
e
r
fo
r
manc
e
on th
e
oth
e
r
a
gr
ee
m
e
nt
l
e
v
e
l
s
sh
o
u
l
d
be
l
o
w
e
r
,
a
s
e
v
e
n
t
h
e
hum
a
ns c
a
n’
t
a
gr
ee
fu
lly
o
n
t
h
e
m
.
Mo
r
e
e
x
p
e
r
i
m
e
n
t
s w
i
t
h
a
n
ot
h
e
r
f
in
a
nc
i
a
l
l
abe
l
ed
data
s
et
i
s n
e
c
e
ss
a
ry
to
c
o
nc
l
u
de
t
h
at
effe
c
t
o
f fur
t
h
e
r pr
e
-
t
r
a
i
n
i
ng
o
n
do
m
a
i
n c
o
rpus
i
s n
ot
s
i
gn
i
f
ic
a
n
t.
6.2Ca
t
a
st
ro
p
h
i
c
for
g
e
tt
i
n
g
(R
Q
4)
F
o
r
m
e
asu
r
in
g
th
e
p
e
r
fo
r
manc
e
of th
e
t
e
chniqu
e
s a
g
ainst cata
-
s
t
r
oph
i
c fo
rg
ett
i
n
g
,
w
e
t
r
y
fou
r
d
i
ff
e
r
e
n
t
s
ett
i
n
g
s
:
N
o ad
j
us
t
m
e
n
t
(
N
A)
,
on
l
y
w
i
t
h s
l
a
n
te
d
t
r
i
a
n
g
u
l
a
r
l
ea
r
n
i
n
g
r
ate
(
S
TL)
,
s
l
a
n
te
d
t
r
i
-
an
g
u
l
a
r
l
e
a
r
nin
g
r
at
e
and
gr
adua
l
unf
r
ee
z
in
g
(
S
TL
+G
U) and th
e
te
chn
i
qu
e
s
i
n
t
h
e
p
r
e
v
i
ous on
e,
t
o
g
et
h
e
r
w
i
t
h d
i
sc
r
i
m
i
n
at
i
v
e
f
in
e
-
t
un
i
ng
.
W
e
r
e
p
o
r
t
t
h
e
p
e
rf
o
rm
a
nc
e
o
f
t
h
e
s
e
f
o
ur s
ett
i
ngs w
i
t
h
l
o
ss
on t
e
st function and t
r
a
j
e
cto
r
y
of
v
a
l
idation
l
oss o
v
e
r
t
r
ainin
g
e
p
o
chs
.
Th
e
r
e
su
l
t
s c
a
n
be
s
ee
n
o
n
tab
l
e
5
a
nd
f
igu
r
e
3.
App
ly
in
g
a
ll
th
r
ee
of th
e
st
r
at
e
g
i
e
s p
r
oduc
e
th
e
be
st p
e
r
fo
r-
m
a
nc
e
i
n
te
r
ms of
te
s
t
l
oss
a
nd
a
ccu
r
a
c
y
.
Gr
a
du
a
l
unf
r
ee
z
i
n
g
a
nd
d
i
sc
r
i
m
i
na
t
i
v
e
f
in
e
-
t
un
i
n
g
ha
v
e
t
h
e
sam
e
r
e
ason
i
n
g
be
h
i
nd
t
h
e
m
:
h
i
gh
e
r
l
e
v
e
l
f
eat
ur
e
s sh
o
u
l
d
be
f
i
n
e
-
t
un
ed
m
o
r
e
t
h
a
n
t
h
e
l
o
w
e
r
l
e
v
e
l
F
i
gu
re 3
:
Vali
d
a
t
io
n
lo
ss
t
raje
ct
orie
s
w
i
t
h
d
i
ff
ere
n
t
t
rai
n
i
n
g
st
ra
t
e
g
ie
s
Table 5
:
P
erfor
m
a
n
c
e
w
i
t
h
d
i
ff
ere
n
t
fi
n
e
-
tu
n
i
n
g
st
ra
t
e
g
ie
s
S
t
r
ate
g
y
L
o
ss
A
ccu
r
a
c
y
F
1
Sc
o
r
e
N
o
n
e0.480.830.83
STL
0.400.810.82
STL
+
G
U
0.400.86
0.86
STL
+
D
FT
0.420.790.79
A
ll
t
h
r
ee
0.370.86
0.84
B
ol
d
fa
c
e
i
n
d
i
c
ate
s
be
s
t
r
e
su
l
t
i
n
t
h
e
c
o
rr
e
sp
o
n
d
-
in
g
m
e
t
r
ic
.
R
e
su
l
ts a
r
e
r
e
po
r
t
e
d on
10
-
fo
l
d c
r
oss
v
a
l
idation
.
S
TL
:
s
l
ant
e
d t
r
ian
g
u
l
a
r
l
e
a
r
nin
g
r
at
e
s
,
G
U
:
gr
adua
l
unf
r
ee
z
i
n
g
,
DF
T
:
d
i
sc
r
i
m
i
na
t
i
v
e
f
in
e
-
t
un
i
ng
.
o
n
e
s
,
s
i
nc
e
i
nf
o
rm
at
i
o
n
l
ea
rn
ed
fr
o
m
l
a
ngu
a
g
e
m
ode
l
i
ng
a
r
e
m
o
s
t
ly
p
r
e
s
e
nt in th
e
l
o
w
e
r
l
e
v
e
l
s
.
W
e
s
ee
f
r
om ta
b
l
e
5
that usin
g
on
ly
disc
r
iminati
v
e
f
in
e
-
tunin
g
w
ith s
l
ant
e
d t
r
ian
g
u
l
a
r
l
e
a
r
nin
g
r
at
e
s
p
e
r
fo
r
ms
w
o
r
s
e
than usin
g
th
e
s
l
ant
e
d t
r
ian
g
u
l
a
r
l
e
a
r
nin
g
r
at
e
s
a
l
on
e.
This sho
w
s that
gr
adua
l
unf
r
ee
z
in
g
is th
e
most impo
r
tant
te
chn
i
q
u
e
f
o
r
o
u
r
c
a
s
e.
On
e
w
a
y
t
h
at
c
ata
s
t
r
o
ph
i
c f
o
r
g
ett
i
ng c
a
n sh
o
w
i
t
s
e
l
f
i
s
t
h
e
su
d
-
d
e
n inc
r
e
as
e
in
v
a
l
idation
l
oss aft
e
r
s
e
v
e
r
a
l
e
pochs
.
As mod
e
l
is
t
r
a
i
n
ed,
i
t
q
u
i
c
kly
s
ta
r
t
s
to
o
v
e
r
f
i
t
wh
e
n n
o
m
ea
sur
e
i
s
ta
k
e
n
a
cc
o
r
d
-
i
ng
ly
.
A
s
i
t
c
a
n
be
s
ee
n
o
n
t
h
e
f
i
gur
e
3,
t
h
at
i
s
t
h
e
c
a
s
e
wh
e
n n
o
n
e
o
f
t
h
e
a
f
o
r
e
m
e
n
t
i
o
n
ed
te
chn
i
q
u
e
s
a
r
e
a
pp
l
i
ed
.
T
h
e
m
ode
l
a
ch
i
e
v
e
s
t
h
e
be
st p
e
r
fo
r
manc
e
on
v
a
l
idation s
e
t aft
e
r
th
e
f
i
r
st
e
poch and th
e
n
s
ta
r
t
s
to
o
v
e
r
f
i
t.
W
h
il
e
w
i
t
h
a
ll
t
h
r
ee
te
chn
i
q
u
e
s
a
pp
li
e
d
,
m
o
d
e
l
i
s
much m
o
r
e
s
tab
l
e.
Th
e
ot
h
e
r
c
o
m
b
i
n
at
i
o
ns
li
e
bet
w
ee
n
t
h
e
s
e
t
w
o
c
a
s
e
s
.
6.3 C
h
oo
s
i
n
g
t
h
e be
st
layer for
c
la
ss
ifi
c
a
t
io
n
(R
Q
5)
B
E
R
T has
12
T
r
ansfo
r
m
e
r
e
ncod
e
r
l
a
y
e
r
s
.
It is not n
e
c
e
ssa
r
i
ly
a
g
i
v
e
n that th
e
l
ast
l
a
y
e
r
captu
r
e
s th
e
most
r
e
l
e
v
ant info
r
mation
r
e
g
a
r
d
i
n
g
c
l
ass
i
f
ica
t
i
on
t
as
k
du
r
i
n
g
l
an
g
ua
g
e
mod
e
l
t
r
a
i
n
i
n
g
.
F
o
r
7
Table 6
:
P
erfor
m
a
n
c
e o
n
d
i
ff
ere
n
t
e
n
c
o
d
er layer
s
us
e
d
for
c
la
ss
ifi
c
a
t
io
n
L
a
y
e
r
f
o
r
c
l
a
ss
i
f
ic
at
i
o
n L
o
ss
A
ccu
r
a
c
y
F
1
Sc
o
r
e
L
a
y
e
r-
10.65
L
a
y
e
r-
20.54
L
a
y
e
r-
30.52
L
a
y
e
r-
40.48
L
a
y
e
r-
50.52
L
a
y
e
r-
60.45
L
a
y
e
r-
70.43
L
a
y
e
r-
80.44
L
a
y
e
r-
90.41
L
a
y
e
r-
100.42
L
a
y
e
r-
110.38
L
a
y
e
r-
120.37
0.76
0.78
0.76
0.80
0.80
0.82
0.82
0.83
0.84
0.83
0.84
0.86
0.77
0.78
0.77
0.77
0.80
0.82
0.83
0.81
0.82
0.82
0.83
0.84
A
ll
l
a
y
e
r
s
-
m
ea
n
0.41
0.84
0.84
t
h
i
s
e
x
p
e
r
i
m
e
n
t,
w
e
i
n
v
e
s
t
i
g
ate
wh
i
ch
l
a
y
e
r
o
u
t
o
f
12
Tr
a
nsf
o
rm
e
r
e
nc
ode
r
l
a
y
e
rs g
i
v
e
t
h
e
be
s
t
r
e
su
l
t
f
o
r c
l
a
ss
i
f
i
c
at
i
o
n
.
W
e
pu
t
t
h
e
c
l
a
s
-
s
i
f
ic
at
i
o
n
l
a
y
e
r
a
f
te
r
t
h
e
C
LS
]
to
k
e
ns
o
f r
e
sp
e
c
t
i
v
e
r
e
pr
e
s
e
n
tat
i
o
ns
.
W
e
a
l
s
o
t
ry
ta
k
i
ng
t
h
e
a
v
e
r
a
g
e
o
f
a
ll
l
a
y
e
r
s
.
As sho
w
n in ta
b
l
e
6
th
e
l
ast
l
a
y
e
r
cont
r
i
b
ut
e
s th
e
most to th
e
m
ode
l
p
e
rf
o
rm
a
nc
e
i
n
te
rms
o
f
a
ll
t
h
e
m
et
r
i
cs m
ea
sur
ed
.
T
h
i
s m
i
gh
t
be
i
n
d
i
c
at
i
v
e
o
f
t
w
o
f
a
c
to
rs
:
1
) Wh
e
n
t
h
e
h
i
gh
e
r
l
a
y
e
rs
a
r
e
us
ed
t
h
e
m
ode
l
t
h
at
i
s
be
i
ng
t
r
a
i
n
ed
i
s
l
a
rg
e
r
,
h
e
nc
e
p
o
ss
i
b
ly
m
o
r
e
p
o
w
e
rfu
l
,
2
)
Th
e
l
o
w
e
r
l
a
y
e
r
s captu
r
e
d
ee
p
e
r
s
e
mantic info
r
mation
,
h
e
nc
e
t
h
e
y
s
t
r
ugg
l
e
to
f
in
e
-
t
un
e
t
h
at
i
nf
o
r
m
at
i
o
n f
o
r
c
l
a
ss
i
f
ic
at
i
o
n
.
6.4Trai
n
i
n
g
o
n
ly a
su
b
s
e
t
of
t
h
e layer
s
(R
Q
6)
B
E
R
T is a
v
e
r
y
l
a
rg
e
mod
e
l
.
E
v
e
n on sma
ll
datas
e
ts
,
f
in
e
-
tunin
g
th
e
w
ho
l
e
mod
e
l
r
e
qui
r
e
s si
g
ni
f
icant tim
e
and computin
g
po
w
e
r
.
Th
e
r
e
fo
r
e
if a s
l
i
g
ht
ly
l
o
w
e
r
p
e
r
fo
r
manc
e
can
be
achi
e
v
e
d
w
ith
f
i
n
e
-
t
un
i
ng
o
n
ly
a
su
b
s
et
o
f
a
ll
p
a
r
a
m
ete
rs
,
i
t
m
i
gh
t
be
pr
e
f
e
r
ab
l
e
i
n
som
e
con
te
x
t
s
.
E
sp
e
c
i
a
ll
y
i
f
t
r
a
i
n
i
n
g
s
et
i
s
v
e
ry
l
a
rg
e,
t
h
i
s ch
a
n
g
e
mi
g
ht ma
k
e
B
E
R
T mo
r
e
con
v
e
ni
e
nt to us
e.
H
e
r
e
w
e
e
x
p
e
r
im
e
nt
w
i
t
h
f
in
e
-
t
un
i
ng
o
n
ly
t
h
e
l
a
s
t
k
m
a
n
y
e
nc
o
d
e
r
l
a
y
e
r
s
.
Th
e
r
e
su
l
t
s a
r
e
p
r
e
s
e
n
te
d on
t
a
b
l
e
7.
F
i
n
e
-
t
un
i
n
g
on
ly
t
h
e
c
l
as
-
s
i
f
ic
at
i
o
n
l
a
y
e
r
d
oe
s n
ot
a
ch
i
e
v
e
c
l
o
s
e
p
e
r
f
o
r
m
a
nc
e
to
f
in
e
-
t
un
i
n
g
o
t
h
e
r
l
a
y
e
r
s
.
H
ow
e
v
e
r
f
in
e
-
t
un
i
n
g
on
l
y
t
h
e
l
a
s
t
l
a
y
e
r
h
a
nd
il
y
ou
t
-
p
e
r
f
o
r
ms
t
h
e
s
tate
-
o
f
-
t
h
e
-
a
r
t
m
a
ch
i
n
e
l
ea
r
n
i
n
g
m
et
h
o
ds
li
k
e
H
SC
.
A
f
te
r L
a
y
e
r
-
9,
t
h
e
p
e
rf
o
rm
a
nc
e
be
c
o
m
e
s
v
i
r
t
u
a
lly
t
h
e
s
a
m
e,
o
n
ly
to
be
o
u
t
p
e
rf
o
rm
ed
b
y
f
i
n
e
-
t
un
i
ng
t
h
e
wh
o
l
e
m
ode
l
.
T
h
i
s r
e
su
l
t
sh
o
ws
that in o
r
d
e
r
to uti
l
i
z
e
B
E
R
T
,
an
e
x
p
e
nsi
v
e
t
r
ainin
g
of th
e
w
ho
l
e
m
ode
l
i
s n
ot
m
a
n
dato
r
y
.
A
f
a
i
r
t
r
ade
-
off
c
a
n
be
m
ade
f
o
r much
l
e
ss
t
r
a
i
n
i
ng
t
i
m
e
w
i
t
h
a
sm
a
ll
d
e
c
r
ea
s
e
i
n m
o
d
e
l
p
e
r
f
o
r
m
a
nc
e.
6.5
W
h
ere doe
s
t
h
e
m
odel fail
?
W
ith
97%
accu
r
ac
y
on th
e
su
b
s
e
t of
F
inancia
l
P
h
r
as
e
B
an
k
w
ith
100%
annotato
r
a
gr
ee
m
e
nt
,
w
e
thin
k
it mi
g
ht
be
an int
e
r
e
stin
g
e
x
e
r
cis
e
to
e
x
amin
e
cas
e
s
w
h
e
r
e
th
e
mod
e
l
fai
l
e
d to p
r
e
dict th
e
t
r
u
e
l
a
be
l
.
Th
e
r
e
fo
r
e
i
n
t
h
i
s s
e
c
t
i
on w
e
w
ill
p
r
e
s
e
n
t
s
e
v
e
r
a
l
e
x
am
-
p
l
e
s
w
h
e
r
e
mod
e
l
ma
k
e
s th
e
wr
on
g
p
r
e
diction
.
A
l
so in Ma
l
o
e
t
Table 7
:
P
erfor
m
a
n
c
e o
n
st
ar
t
i
n
g
t
rai
n
i
n
g
fro
m
d
i
ff
ere
n
t
lay
-
er
s
F
i
r
s
t
l
a
y
e
r
unf
r
ee
z
e
d L
o
ss
A
ccu
r
a
c
y
T
r
a
i
n
i
ng
t
i
m
e
E
m
be
dd
i
ngs
l
a
y
e
r
0.37
L
a
y
e
r-
10.39
L
a
y
e
r-
20.39
L
a
y
e
r-
30.38
L
a
y
e
r-
40.38
L
a
y
e
r-
50.40
L
a
y
e
r-
60.40
L
a
y
e
r-
70.39
L
a
y
e
r-
80.39
L
a
y
e
r-
90.39
L
a
y
e
r-
100.41
L
a
y
e
r-
110.45
L
a
y
e
r-
120.47
C
l
a
ss
i
f
ic
at
i
o
n
l
a
y
e
r
1.04
0.86
0.83
0.83
0.83
0.82
0.83
0.81
0.82
0.84
0.84
0.84
0.82
0.81
0.52
332
s
302
s
291
s
272
s
250
s
240
s
220
s
205
s
188
s
172
s
158
s
144
s
133
s
119
s
a
l
.
(
2014
)
[
17
]
,
i
t
is indica
te
d
t
ha
t
mos
t
of
t
h
e
in
te
r-
anno
t
a
t
o
r
dis
-
a
gr
ee
m
e
n
t
s
a
r
e
bet
w
ee
n p
o
s
i
t
i
v
e
a
n
d
n
e
u
t
r
a
l
l
abe
l
s (
a
gr
ee
m
e
n
t
f
o
r
s
e
p
a
r
at
i
ng p
o
s
i
t
i
v
e
-
n
e
g
at
i
v
e,
n
e
g
at
i
v
e
-
n
e
u
t
r
a
l
a
n
d
p
o
s
i
t
i
v
e
-
n
e
u
t
r
a
l
a
r
e
98.7%,
94.2%
and
75.2%
r
e
sp
e
cti
v
e
ly
)
.
Autho
r
s att
r
i
b
ut
e
that
th
e
dificu
l
t
y
of distin
g
uishin
g
"
common
ly
us
e
d compan
y
g
l
itt
e
r
a
n
d
a
c
t
u
a
l
p
o
s
i
t
i
v
e
s
tate
m
e
n
t
s
"
.
W
e
w
i
ll
pr
e
s
e
n
t
t
h
e
c
o
nfus
i
o
n m
a
-
t
r
i
x
i
n
o
r
de
r
to
ob
s
e
r
v
e
wh
et
h
e
r
t
h
i
s
i
s
t
h
e
c
a
s
e
f
o
r F
i
nB
E
R
T
a
s w
e
ll
.
Exa
mp
le 1
:
Pre-ta
x
l
o
ss t
o
tale
d
e
u
r
o
0.3 milli
on
,
c
o
m
p
are
d
t
o
a l
o
ss
o
f e
u
r
o
2.2 milli
on
i
n
t
h
e first
qu
arter
o
f 2005 .
Tr
u
e
v
al
u
e
:
P
o
s
i
t
i
v
e
P
re
d
i
ct
e
d
:
N
e
g
at
i
v
e
Exa
mp
le 2
:
T
h
is im
p
leme
n
tati
on
is
v
er
y
im
po
rta
n
t t
o
t
h
e
op
erat
o
r , si
n
c
e it is a
bou
t t
o
la
un
c
h
its
F
i
x
e
d
t
o
M
ob
ile
c
on
v
erge
n
c
e ser
v
i
c
e i
n
B
razil
Tr
u
e
v
al
u
e
:
N
e
u
t
r
a
l
P
re
d
i
ct
e
d
:
P
o
s
i
t
i
v
e
Exa
mp
le 3
:
T
h
e sit
u
ati
on
o
f
c
o
ate
d
magazi
n
e
p
ri
n
ti
n
g
p
a
p
er will
c
on
ti
nu
e t
o
b
e wea
k
.
Tr
u
e
v
al
u
e
:
N
e
g
at
i
v
e
P
re
d
i
ct
e
d
:
N
e
u
t
r
a
l
Th
e
f
i
r
st
e
x
amp
l
e
is actua
lly
th
e
most common t
y
p
e
of fai
l
u
r
e.
Th
e
mod
e
l
fai
l
s to do th
e
math in
w
hich
f
i
g
u
r
e
is hi
g
h
e
r
,
and in
t
h
e
ab
s
e
nc
e
o
f w
o
r
d
s
i
n
d
i
c
at
i
v
e
o
f
d
i
r
e
c
t
i
o
n
l
i
k
e
"
i
ncr
ea
s
ed
"
,
m
i
gh
t
ma
k
e
th
e
p
r
e
diction of n
e
ut
r
a
l
.
H
o
w
e
v
e
r
,
th
e
r
e
a
r
e
man
y
simi
l
a
r
c
a
s
e
s
wh
e
r
e
i
t
doe
s m
a
k
e
t
h
e
t
ru
e
pr
ed
i
c
t
i
o
n
too
.
Ex
a
mp
l
e
s
2
a
n
d
3
a
r
e
diff
e
r
e
n
t
v
e
r
sions of
t
h
e
sam
e
t
y
p
e
of fai
l
u
r
e.
Th
e
mod
e
l
fai
l
s to
distin
g
uish a n
e
ut
r
a
l
stat
e
m
e
nt a
b
out a
g
i
v
e
n situation f
r
om a
s
t
a
te
m
e
n
t
t
ha
t
i
nd
i
ca
te
d po
l
a
r
i
t
y
a
b
ou
t
t
h
e
compan
y
.
I
n
t
h
e
t
h
i
r
d
e
x
a
mp
l
e,
i
nf
o
rm
at
i
o
n
abo
u
tt
h
e
c
o
mp
a
n
y
s
b
us
i
n
e
ssw
o
u
l
d
pr
obab
ly
h
e
l
p
.
T
h
e
c
o
nfus
i
o
n m
at
r
i
x
i
s pr
e
s
e
n
ted
o
n
f
i
gur
e
4
.
73%
o
f
t
h
e
f
a
i
l
ur
e
s
h
a
pp
e
n
bet
w
ee
n
l
abe
l
s p
o
s
i
t
i
v
e
a
nd n
e
g
at
i
v
e,
wh
il
e
s
a
m
e
num
be
r
is
5%
fo
r
n
e
g
ati
v
e
and positi
v
e.
That is consist
e
nt
w
ith
b
oth th
e
i
n
te
r
-
a
nn
otato
r
a
gr
ee
m
e
n
t
num
be
rs
a
n
d
c
o
mm
o
n s
e
ns
e
.
I
t
i
s
ea
s
i
e
r
8
F
i
gu
re 4
:
Co
n
f
us
io
n
m
a
t
rix
to
d
i
ffe
r
e
n
t
i
ate
bet
w
ee
n p
o
s
i
t
i
v
e
a
n
d
n
e
g
at
i
v
e
.
Bu
t
i
t
m
i
gh
t
be
m
o
r
e
cha
ll
e
n
g
in
g
to d
e
cid
e
w
h
e
th
e
r
a stat
e
m
e
nt indicat
e
s a positi
v
e
o
u
t
l
oo
k
o
r
m
e
r
e
ly
a
n
ob
j
e
c
t
i
v
e
ob
s
e
rv
at
i
o
n
.
7C
ON
CL
US
I
ON
A
N
D
F
U
T
U
RE
WO
RK
In this pap
e
r
,
w
e
imp
l
e
m
e
nt
e
d
B
E
R
T fo
r
th
e
f
inancia
l
domain
b
y
fu
r
t
h
e
r
p
r
e
-
t
r
a
i
n
i
n
g
i
t
on a
f
inanc
i
a
l
co
r
pus and
f
in
e
-
t
un
i
n
g
i
t
fo
r
s
e
n
t
i
m
e
n
t
a
n
a
ly
s
i
s (F
i
n
BER
T)
.
Th
i
s w
o
r
k
i
s
t
h
e
f
irs
t
a
pp
l
i
c
at
i
o
n
o
f
BER
T f
o
r
f
in
a
nc
e
to
t
h
e
be
s
t
o
f
o
u
r
k
n
o
w
l
e
d
g
e
a
nd
o
n
e
o
f
t
h
e
f
e
w
t
h
at
e
x
p
e
r
i
m
e
n
te
d w
i
t
h fu
r
t
h
e
r
p
r
e
-
t
r
a
i
n
i
n
g
o
n
a
d
o
m
a
i
n
-
sp
e
c
i
f
ic
c
o
rpus
.
On
bot
h
o
f
t
h
e
data
s
et
s w
e
us
ed,
w
e
a
ch
i
e
v
ed
s
tate
-
o
f
-
t
h
e
-
a
r
t
r
e
su
l
ts
b
y
a si
g
ni
f
icant ma
rg
in
.
F
o
r
th
e
c
l
assi
f
ication tas
k
,
w
e
i
nc
r
ea
s
e
d
t
h
e
s
tate
-
o
f
-
t
h
e
a
r
t
b
y
15%
i
n
a
ccu
r
a
c
y
.
In addition to
B
E
R
T
,
w
e
a
l
so imp
l
e
m
e
nt
e
d oth
e
r
p
r
e
-
t
r
ainin
g
l
a
ngu
a
g
e
m
ode
l
s
l
i
k
e
E
LM
o
a
n
d
ULMF
i
t
f
o
r
c
o
mp
a
r
i
s
o
n pu
r
p
o
s
e
s
.
UL
M
F
i
t,
fur
t
h
e
r pr
e
-
t
r
a
i
n
ed
o
n
a
f
i
n
a
nc
i
a
l
c
o
rpus
,
beat
t
h
e
pr
e
v
i
o
us
s
tate
-
o
f
-
t
h
e
a
r
t
f
o
r
t
h
e
c
l
a
ss
i
f
ic
at
i
o
n
ta
s
k
,
o
n
ly
to
a
sm
a
ll
e
r
de
g
r
ee
t
h
a
n
BER
T
.
Th
e
s
e
r
e
su
l
t
s sh
o
w
t
h
e
effe
c
t
i
v
e
n
e
ss
o
f pr
e
-
t
r
a
i
n
ed
l
a
n
-
g
ua
g
e
mod
e
l
s fo
r
a do
w
n
-
s
t
r
e
am
t
as
k
such as s
e
n
t
i
m
e
n
t
ana
ly
s
i
s
e
sp
e
cia
lly
w
ith a sma
ll
l
a
be
l
e
d datas
e
t
.
Th
e
comp
l
e
t
e
datas
e
t in
-
c
l
u
ded
m
o
r
e
t
h
a
n
3000
e
x
a
mp
l
e
s
,
b
u
t
F
i
n
BER
T w
a
s
ab
l
e
to
surp
a
ss
th
e
p
r
e
v
ious stat
e
-
of
-
th
e
a
r
t
e
v
e
n
w
ith a t
r
ainin
g
s
e
t as sma
ll
as
500
e
x
a
mp
l
e
s
.
T
h
i
s
i
s
a
n
i
mp
o
r
ta
n
t
r
e
su
l
t,
s
i
nc
e
dee
p
l
ea
rn
i
ng
te
ch
-
n
i
q
u
e
s f
o
r NLP h
a
v
e
bee
n
t
r
ad
i
t
i
o
n
a
lly
l
abe
l
ed
a
s
too
"
data
-
hungr
y
"
,
wh
i
ch
i
s
a
pp
a
r
e
n
t
ly
n
o
l
o
ng
e
r
t
h
e
c
a
s
e.
W
e
c
o
nduc
te
d
e
x
te
ns
i
v
e
e
x
p
e
r
i
m
e
n
t
s w
i
t
h
BER
T
,
i
n
v
e
s
t
i
g
at
i
n
g
th
e
e
ff
e
cts of fu
r
th
e
r
p
r
e
-
t
r
ainin
g
and s
e
v
e
r
a
l
t
r
ainin
g
st
r
at
e
g
i
e
s
.
W
e
c
o
u
l
d
n’
t
c
o
nc
l
u
de
t
h
at
fur
t
h
e
r pr
e
-
t
r
a
i
n
i
ng
o
n
a
do
m
a
i
n
-
sp
e
c
i
f
i
c
c
o
rpus w
a
s s
i
gn
i
f
ic
a
n
t
ly
bette
r
t
h
a
n n
ot
do
i
ng s
o
f
o
r
o
ur c
a
s
e
.
Our
t
h
eo
r
y
i
s
t
h
at
B
E
R
T
a
l
r
ead
y
p
e
rf
o
rms g
ood
e
n
o
ugh w
i
t
h
o
ur
data
s
et
that th
e
r
e
is not much
r
oom fo
r
imp
r
o
v
e
m
e
nt that fu
r
th
e
r
p
r
e
-
t
r
a
i
n
i
ng c
a
n pr
o
v
i
de
.
W
e
a
l
s
o
f
o
un
d
t
h
at
l
ea
rn
i
ng r
ate
r
e
g
i
m
e
s
t
h
at
f
in
e
-
t
un
e
t
h
e
h
i
gh
e
r
l
a
y
e
r
s m
o
r
e
a
gg
r
e
ss
i
v
e
ly
t
h
a
n
t
h
e
l
o
w
e
r
o
n
e
s
p
e
r
fo
r
m
be
tt
e
r
and a
r
e
mo
r
e
e
ff
e
cti
v
e
in p
r
e
v
e
ntin
g
catast
r
ophic
fo
rg
e
ttin
g
.
Anoth
e
r
conc
l
usion f
r
om ou
r
e
x
p
e
r
im
e
nts
w
as that
,
c
o
mp
a
r
ab
l
e
p
e
r
f
o
r
m
a
nc
e
c
a
n
be
a
ch
i
e
v
ed
w
i
t
h much
l
e
ss
t
r
a
i
n
i
ng
t
i
m
e
b
y
f
in
e
-
t
un
i
ng
o
n
ly
t
h
e
l
a
s
t
2
l
a
y
e
r
s
o
f
BER
T
.
F
inancia
l
s
e
ntim
e
nt ana
ly
sis is not a
g
oa
l
on its o
w
n
,
it is as
us
e
fu
l
a
s
i
t
c
a
n supp
o
r
t
f
i
n
a
nc
i
a
l
de
c
i
s
i
o
ns
.
On
e
w
a
y
t
h
at
o
ur w
o
r
k
mi
g
ht
be
e
x
t
e
nd
e
d
,
cou
l
d
be
usin
g
F
in
B
E
R
T di
r
e
ct
ly
w
ith stoc
k
ma
rk
et
r
et
u
r
n da
t
a (
b
o
t
h
i
n
te
r
ms of d
i
r
e
c
t
i
ona
li
t
y
and
v
o
l
a
t
ili
t
y
)
o
n
f
in
a
nc
i
a
l
n
e
ws
.
F
i
n
BER
T
i
s g
ood
e
n
o
ugh f
o
r
e
x
t
r
a
c
t
i
ng
e
x
p
l
i
c
i
t
s
e
ntim
e
nts
,
b
ut mod
e
l
in
g
imp
l
icit info
r
mation that is not n
e
c
e
s
-
s
a
r
i
ly
a
pp
a
r
e
n
t
e
v
e
n
to
t
h
o
s
e
wh
o
a
r
e
wr
i
t
i
ng
t
h
e
te
x
t
sh
o
u
l
d
be
a
ch
a
ll
e
ng
i
ng
ta
s
k
.
A
n
ot
h
e
r p
o
ss
i
b
l
e
e
x
te
ns
i
o
n c
a
n
be
us
i
ng F
i
nB
E
R
T
fo
r
o
t
h
e
r
na
t
u
r
a
l
l
an
g
ua
g
e
p
r
oc
e
ss
i
n
g
t
as
k
s such as nam
e
d
e
n
t
i
t
y
r
e
c
o
gn
i
t
i
o
n
o
r
q
u
e
s
t
i
o
n
a
nsw
e
r
i
ng
i
n
f
in
a
nc
i
a
l
d
o
m
a
i
n
.
8ACK
NOW
LE
DG
EME
N
T
S
I
w
o
u
l
d
l
i
k
e
to
sh
o
w m
y
gr
at
i
t
u
de
to
P
e
ng
j
i
e
R
e
n
a
n
d
Zu
lk
uf
G
e
nc
fo
r
t
h
e
i
r
e
x
c
e
ll
e
n
t
sup
e
rv
i
s
i
on
.
Th
e
y
p
r
o
v
i
d
e
d m
e
w
i
t
h
b
o
t
h
i
nd
e
-
p
e
nd
e
nc
e
i
n s
ett
i
n
g
m
y
own cou
r
s
e
fo
r
t
h
e
r
e
s
ea
r
ch
a
nd
v
a
l
u
ab
l
e
sugg
e
s
t
i
o
ns wh
e
n
I
n
eed
t
h
e
m
.
I
w
o
u
l
d
a
l
s
o
l
i
k
e
to
t
h
a
n
k
N
a
sp
e
rs
AI
tea
m
,
f
o
r
e
n
t
r
us
t
i
ng m
e
w
i
t
h
t
h
i
s p
r
o
j
e
c
t
a
nd
a
l
w
a
y
s
e
nc
o
u
r
a
g
i
ng
m
e
to sha
r
e
m
y
w
o
rk
.
I am
gr
at
e
fu
l
to
N
I
S
T
,
fo
r
sha
r
in
g
R
e
ut
e
r
s
T
R
C
-
2
c
o
r
pus w
i
t
h m
e
a
nd
to
M
a
l
o
et
a
l
.
f
o
r
m
a
k
i
n
g
t
h
e
e
x
c
e
ll
e
n
t
F
i
n
a
nc
i
a
l
Ph
r
a
s
e
B
a
n
k
pu
b
l
i
c
ly
a
v
a
i
l
ab
l
e.
RE
F
ERE
N
CE
S
[
1
]
B
asant A
g
a
rw
a
l
and
N
amita Mitta
l
.
2016.
Ma
c
hin
e
L
e
arning
A
pp
r
o
a
c
h f
o
r
S
e
n
t
im
e
n
t
A
na
l
y
s
i
s.
S
p
r
in
g
e
r
Int
e
r
nationa
l
P
ub
l
ishin
g
,
Cham
,
21
45.
https
:
//
do
i
.
o
rg
/
10.1007
/
978
-
3
-
319
-
25343
-
5
_
3
[
2
]
Osc
a
r
A
r
aq
u
e,
I
gn
a
c
i
o
C
o
rcu
e
r
a
-
P
l
ata
s
,
J
.
F
e
rn
a
n
do
S
á
nch
e
z
-R
ada,
a
n
d
C
a
r
l
o
s
A
.
I
g
l
e
s
i
a
s
.
2017.
E
nh
a
nc
i
ng d
ee
p
l
ea
r
n
i
ng s
e
n
t
i
m
e
n
t
a
n
a
ly
s
i
s w
i
t
h
e
ns
e
m
b
l
e
te
ch
-
niqu
e
s in socia
l
app
l
ications
.
Ex
p
e
r
t
S
y
st
e
m
s
wi
t
h
A
pp
l
i
c
a
t
i
o
n
s
77
(
j
u
l
2017
)
,
236
246.
h
tt
ps
:
//
do
i
.
o
rg
/
10.1016
/
j
.e
swa
.2017.02.002
[
3
]
J
a
c
ob
D
e
vl
i
n
,
M
i
ng
-
W
e
i
C
h
a
ng
,
K
e
n
to
n L
ee,
a
n
d
K
r
i
s
t
i
n
a
T
o
u
ta
n
o
v
a
.
2018
.
B
E
R
T
:
Pr
e
-
t
r
ainin
g
of
D
ee
p
B
idi
r
e
ctiona
l
T
r
ansfo
r
m
e
r
s fo
r
Lan
g
ua
g
e
Und
e
r
standin
g
.
(
2018
)
.
h
tt
ps
:
//
do
i
.
o
rg
/
a
r
X
i
v
:1811.03600
v
2
a
r
X
i
v
:1810.04805
[
4
]
L
i
G
u
o,
F
e
ng Sh
i
,
a
n
d
Jun
T
u
.
2016
.
T
e
x
t
u
a
l
a
n
a
ly
s
i
s
a
n
d
m
a
ch
i
n
e
l
ea
n
i
ng
:
C
r
a
c
k
unst
r
uctu
r
e
d data in
f
inanc
e
and accountin
g
.
T
h
e
J
o
urna
l
o
f Finan
c
e
an
d
D
a
t
a
S
c
i
e
n
c
e
2,
3
(s
e
p
2016
)
,
153
170.
h
tt
ps
:
//
do
i
.
o
rg
/
10.1016
/
J
.
JF
D
S
.2017.02.001
[
5
]
J
e
r
e
m
y
H
o
w
a
r
d and
S
e
bastian
R
ud
e
r
.
2018.
Uni
v
e
r
sa
l
Lan
g
ua
g
e
Mod
e
l
F
in
e
-
t
un
i
ng f
o
r T
e
x
t
C
l
a
ss
i
f
ic
at
i
o
n
.
(
j
a
n
2018
)
.
a
rX
i
v
:1801.06146
h
tt
p
:
//
a
r
x
i
v
.o
rg
/
ab
s
/
1801.06146
[
6
]
N
ee
l
Kant
,
R
au
l
P
u
r
i
,
N
i
k
o
l
ai
Y
a
k
o
v
e
n
k
o
,
and
Br
y
an Catan
z
a
r
o
.
2018.
Pr
ac
-
tica
l
T
e
x
t C
l
assi
f
ication
W
ith La
rg
e
Pr
e
-
T
r
ain
e
d Lan
g
ua
g
e
Mod
e
l
s
.
(
2018
)
.
a
r
X
i
v
:1812.01207
h
tt
p
:
//
a
rx
i
v
.
o
rg
/
a
b
s
/
1812.01207
[
7
]
Mathias K
r
aus and
S
t
e
fan
F
e
u
e
rr
i
e
g
e
l
.
2017.
D
e
cision suppo
r
t f
r
om
f
inancia
l
d
i
sc
l
o
sur
e
s w
i
t
h
dee
p n
e
ur
a
l
n
et
w
o
r
k
s
a
n
d
t
r
a
nsf
e
r
l
ea
rn
i
ng
.
De
c
i
s
i
o
n Su
pp
o
r
t
S
y
s
-
t
em
s
104
(
2017
)
,
38
48
.
h
tt
ps
:
//
do
i
.
o
rg
/
10
.
1016
/
j
.
d
ss
.
2017
.
10
.
001
a
rX
i
v
:1710
.
03954
[
8
]
Sr
i
k
uma
r
K
r
i
shnamoo
r
th
y
.
2018.
S
e
nt
i
m
e
nt ana
ly
s
i
s of
f
inanc
i
a
l
n
e
w
s a
r
t
i
c
l
e
s
usin
g
p
e
r
fo
r
manc
e
indicato
r
s
.
Kn
o
w
l
e
d
g
e
an
d
I
nf
o
rma
t
i
o
n S
y
st
e
m
s
56,
2
(au
g
2018
)
,
373
394.
h
tt
ps
:
//
do
i
.
o
rg
/
10.1007
/
s
10115
-
017
-
1134
-
1
[
9
]
X
i
aodo
ng L
i
,
H
ao
r
a
n X
i
e,
L
i
C
h
e
n
,
J
i
a
np
i
ng W
a
ng
,
a
n
d
X
i
aot
i
e
D
e
ng
.
2014
.
N
e
ws
i
mpact on stoc
k
p
r
i
c
e
r
e
tu
r
n
v
i
a s
e
nt
i
m
e
nt ana
l
y
s
i
s
.
Kn
o
w
l
e
d
g
e
-
Ba
s
e
d
S
y
st
e
m
s
69
(oc
t
2014
)
,
14
23.
h
tt
ps
:
//
do
i
.
o
rg
/
10.1016
/
j
.
k
nos
y
s
.2014.04.022
[
10
]
B
i
n
g
L
i
u
.
2012.
S
e
nt
i
m
e
nt Ana
l
y
s
i
s and
O
p
i
n
i
on M
i
n
i
n
g
.
S
y
n
t
h
e
s
i
s
L
e
ct
ur
e
s
o
n
H
uman Languag
e
T
e
c
hn
o
l
o
gi
e
s
5,
1
(ma
y
2012
)
,
1
167.
https
:
//
do
i
.
o
rg
/
10.2200
/
s
00416e
d
1
v
01
y
201204
h
l
t016
[
11
]
Tim Lou
g
h
r
an and
B
i
ll
Mcdona
l
d
.
2011.
W
h
e
n Is a Liabi
l
it
y
N
ot a Liabi
l
it
y
?
T
e
x
tua
l
Ana
ly
sis
,
D
ictiona
r
i
e
s
,
and
10
-
Ks
.
J
o
urna
l
o
f Finan
c
e
66,
1
(f
e
b
2011
)
,
35
65.
h
tt
ps
:
//
do
i
.
o
rg
/
10.1111
/
j
.1540
-
6261.2010.01625.
x
[
12
]
Tim Lou
g
h
r
an and
B
i
ll
Mcdona
l
d
.
2016.
T
e
x
tua
l
Ana
ly
sis in Accountin
g
and
F
inanc
e:
A
S
u
rv
e
y
.
J
o
urna
l
o
f
A
cco
un
t
ing R
e
s
e
ar
c
h
54,
4
(
2016
)
,
1187
1230.
h
tt
ps
:
//
do
i
.
o
rg
/
10.1111
/
1475
-
679
X
.12123
[
13
]
B
e
r
nha
r
d Lut
z
,
N
ico
l
as
Pr
ö
ll
ochs
,
and
D
i
rk
N
e
umann
.
2018.
S
e
n
t
e
n
c
e
-
L
e
v
e
l
S
e
n
t
im
e
n
t
A
na
l
y
s
i
s
o
f Finan
c
ia
l
N
e
w
s
U
s
ing
D
i
st
ri
b
u
t
e
d
T
e
x
t
R
e
p
r
e
s
e
n
t
a
t
i
o
n
s
an
d
Mu
l
t
i
-I
n
st
an
c
e
L
e
arning
.
T
e
chnica
l
R
e
po
r
t
.
a
r
Xi
v
:1901.00400
http
:
//
a
r
x
i
v
.
o
rg
/
a
b
s
/
1901.00400
[
14
]
M
a
c
edo
M
a
i
a,
A
n
d
r
ï
£¡
Fr
e
i
ta
s
,
a
n
d
S
i
e
gfr
i
ed
H
a
n
d
schuh
.
2018
.
F
i
nSSL
x
:
A
S
e
n
t
i
-
m
e
n
t
A
n
a
ly
s
i
s M
ode
l
f
o
r
t
h
e
F
i
n
a
nc
i
a
l
D
o
m
a
i
n Us
i
ng
T
e
x
t
S
i
mp
l
i
f
i
c
at
i
o
n
.
I
n
2018
I
EEE
12
t
h
I
n
t
erna
t
i
o
na
l
C
o
nferen
c
e
o
n Seman
t
i
c
C
o
m
p
u
t
i
ng (
IC
S
C
)
.
I
EEE
,
318
319
.
h
tt
ps
:
//
do
i
.
o
rg
/
10.1109
/
I
CSC
.2018.00065
[
15
]
M
a
c
edo
M
a
i
a,
S
i
e
gfr
i
ed
H
a
n
d
schuh
,
A
n
d
r
é
Fr
e
i
ta
s
,
Br
i
a
n
D
a
v
i
s
,
R
o
ss Mc
de
rm
ott,
M
a
n
e
l
Z
a
rr
o
u
k
,
A
l
e
x
a
ndr
a
B
a
l
a
hur
,
a
nd
R
o
ss Mc
-D
e
rm
ott.
2018.
C
o
mp
a
n
i
o
n
o
f
t
h
e
T
h
e
W
eb
C
o
nf
e
r
e
nc
e
2018
o
n
T
h
e
W
eb
C
o
nf
e
r
e
nc
e
2018,
{
WWW
}
2018,
L
y
o
n
,
F
r
anc
e,
A
p
r
i
l
23
-
27,
2018.
A
CM
.
h
tt
ps
:
//
do
i
.
o
rg
/
10.1145
/
3184558
[
16
]
B
u
r
ton
G
Ma
l
k
i
e
l
.
2003.
Th
e
E
fic
i
e
nt Ma
rk
e
t
Hy
poth
e
s
i
s and
I
ts C
r
i
t
i
cs
.
J
o
ur
-
na
l
o
f
E
co
n
o
mi
c
P
e
r
sp
e
ct
i
v
e
s
17,
1
(f
e
b
2003
)
,
59
82.
https
:
//
doi
.
o
rg
/
10.1257
/
9
089533003321164958
[
17
]
P
e
kk
a Ma
l
o
,
An
k
u
r
S
inha
,
P
e
kk
a Ko
r
hon
e
n
,
J
y
rk
i
W
a
ll
e
nius
,
and
P
y
r
y
Ta
k
a
l
a
.
2014.
G
oo
d d
ebt
o
r
ba
d d
ebt:
D
ete
c
t
i
ng s
e
m
a
n
t
i
c
o
r
i
e
n
tat
i
o
ns
i
n
e
c
o
n
o
m
i
c
te
x
t
s
.
J
o
urna
l
o
f
t
h
e
A
ssoc
ia
t
i
o
n f
o
r
I
nf
o
rma
t
i
o
n S
c
i
e
n
c
e
an
d
T
e
c
hn
o
l
o
g
y
65,
4
(
2014
)
,
782
796.
h
tt
ps
:
//
do
i
.
o
rg
/
10.1002
/
as
i
.23062
a
r
X
i
v
:
a
r
X
i
v
:1307.5336
v
2
[
18
]
G
.
M
a
rcus
.
2018.
D
ee
p L
ea
rn
i
ng
:
A
Cr
i
t
i
c
a
l
A
ppr
a
i
s
a
l
.
ar
Xi
v e
-
p
r
i
n
ts
(J
a
n
.
2018
)
.
a
r
X
i
v
:
cs
.
AI
/
1801.00631
[
19
]
Jus
t
i
n M
a
r
t
i
n
ea
u
a
n
d
T
i
m F
i
n
i
n
.
2009.
D
e
l
ta
TF
I
D
F
:
A
n
I
mpr
o
v
ed
F
eat
ur
e
Sp
a
c
e
fo
r
S
e
ntim
e
nt Ana
ly
sis
..
In
I
C
WSM
,
Ey
tan Ada
r
,
Matth
e
w
H
u
r
st
,
Tim
F
inin
,
N
ata
l
i
e
S
.
G
l
anc
e,
N
ico
l
as
N
ico
l
o
v
,
and
B
e
ll
e
L
.
Ts
e
n
g
(
E
ds
.
)
.
Th
e
AAAI
Pr
e
ss
.
h
tt
p
:
//
d
b
l
p
.
un
i
-
t
r
i
e
r
.
d
e
/
d
b
/
conf
/
i
cwsm
/
i
cwsm
2009.
h
t
m
l
#
Ma
r
t
i
n
e
auF
09
[
20
]
Br
y
an McCann
,
Jam
e
s
Br
adbu
r
y
,
Caimin
g
Xion
g
,
and
R
icha
r
d
S
och
e
r
.
2017.
L
e
a
r
n
e
d in T
r
ans
l
ation
:
Cont
e
x
tua
l
i
z
e
d
W
o
r
d
V
e
cto
r
s
.
N
ips (
2017
)
,
1
12.
a
r
X
i
v
:1708.00107
h
tt
p
:
//
a
rx
i
v
.
o
rg
/
a
b
s
/
1708.00107
[
21
]
S
t
e
ph
e
n M
e
r
it
y
,
N
itish
S
hi
r
ish K
e
s
k
a
r
,
and
R
icha
r
d
S
och
e
r
.
2017.
R
e
g
u
l
a
r-
i
z
in
g
and
O
ptimi
z
in
g
L
S
TM Lan
g
ua
g
e
Mod
e
l
s
.
C
o
RR abs
/
1708.02182
(
2017
)
.
a
r
X
i
v
:1708.02182
h
tt
p
:
//
a
rx
i
v
.
o
rg
/
a
b
s
/
1708.02182
[
22
]
J
e
ff
r
e
y
P
e
nnin
g
ton
,
R
icha
r
d
S
och
e
r
,
and Ch
r
istoph
e
r
Mannin
g
.
2014.
G
l
o
v
e:
G
l
oba
l
V
e
cto
r
s fo
r
W
o
r
d
R
e
p
r
e
s
e
ntation
.
In Pr
oc
ee
d
ing
s
o
f
t
h
e
2014
C
o
nf
e
r
e
n
c
e
o
n
E
m
p
iri
c
a
l
M
e
t
h
o
ds
in Na
t
ura
l
Languag
e
Pr
o
c
e
ss
ing (
E
M
NLP)
.
Assoc
i
at
i
on fo
r
C
o
mpu
tat
i
o
n
a
l
L
i
ngu
i
s
t
i
cs
,
D
o
h
a,
Q
ata
r
,
1532
1543.
h
tt
ps
:
//
do
i
.
o
r
g
/
10.3115
/
v
1
/
D
14
-
1162
[
23
]
Matth
e
w
E
P
e
t
e
r
s
,
Ma
rk
N
e
umann
,
Mohit I
yy
e
r
,
Matt
G
a
r
dn
e
r
,
Ch
r
istoph
e
r
C
l
a
rk
,
K
e
nton L
ee,
and Lu
k
e
Z
e
tt
l
e
mo
y
e
r
.
2018.
D
ee
p cont
e
x
tua
l
i
z
e
d
w
o
r
d
r
e
pr
e
s
e
n
tat
i
o
ns
.
(
2018
)
.
h
tt
ps
:
//
d
o
i
.o
rg
/
10.18653
/
v
1
/
N
18
-
1202
a
rX
i
v
:1802.05365
[
24
]
G
uan
g
y
uan
P
iao and John
G
Br
e
s
l
in
.
2018.
F
inancia
l
Asp
e
ct and
S
e
ntim
e
nt
Pr
e
dictions
w
ith
D
ee
p
N
e
u
r
a
l
N
e
t
w
o
rk
s
.
1973
1977.
https
:
//
doi
.
o
rg
/
10.1145
/
3184558.3191829
[
25
]
A
l
ia
k
s
e
i
S
e
v
e
r
y
n and A
l
e
ssand
r
o Moschitti
.
2015.
T
w
itt
e
r
S
e
ntim
e
nt Ana
ly
sis
w
i
t
h
D
ee
p
C
o
n
v
o
l
u
t
i
o
n
a
l
N
e
ur
a
l
N
et
w
o
r
k
s
.
I
n
P
r
o
c
ee
d
i
ng
s
o
f
t
he
38
t
h
I
n
t
erna
t
i
o
na
l
AC
M S
IGI
R
C
o
nf
e
r
e
n
c
e
o
n R
e
s
e
ar
c
h an
d
D
e
v
e
l
op
m
e
n
t
in
I
nf
o
rma
t
i
o
n R
e
t
ri
e
v
a
l
-
S
I
G
I
R
’15.
A
CM P
r
e
ss
.
h
tt
ps
:
//
do
i
.
o
rg
/
10.1145
/
2766462.2767830
[
26
]
S
a
h
a
r
Soh
a
ng
i
r
,
D
i
ngd
i
ng
W
a
ng
,
A
nn
a
Pom
e
r
a
n
et
s
,
a
nd T
a
gh
i
M Khoshgof
taa
r
.
2018.
B
i
g
D
ata
:
D
ee
p L
e
a
r
nin
g
fo
r
f
inancia
l
s
e
ntim
e
nt ana
ly
sis
.
J
o
urna
l
o
f Big
D
a
t
a
5,
1
(
2018
)
.
h
tt
ps
:
//
do
i
.
o
rg
/
10.1186
/
s
40537
-
017
-
0111
-
6
[
27
]
Chi
S
un
,
Xip
e
n
g
Q
iu
,
Y
i
g
e
Xu
,
and Xuan
j
in
g
H
uan
g
.
2019.
H
o
w
to
F
in
e
-
Tun
e
B
E
R
T fo
r
T
e
x
t C
l
assi
f
ication? (
2019
)
.
a
r
Xi
v
:1905.05583
https
:
//
a
r
x
i
v
.
o
rg
/
pdf
/
1905.05583
v
1.
pdfh
tt
p
:
//
a
rx
i
v
.
o
rg
/
a
b
s
/
1905.05583
[
28
]
A
b
i
n
a
sh
T
r
i
p
at
h
y
,
A
n
k
i
t
A
gr
a
w
a
l
,
a
n
d
S
a
n
ta
nu
K
um
a
r R
at
h
.
2016
.
C
l
a
ss
i
f
i
c
at
i
o
n
of s
e
n
t
i
m
e
n
t
r
e
v
i
e
ws us
i
n
g
n
-gr
am mach
i
n
e
l
e
a
r
n
i
n
g
app
r
oach
.
Ex
p
e
r
t
S
y
st
e
m
s
w
i
t
h
A
pp
li
c
a
t
i
o
n
s
57
(s
e
p
2016
)
,
117
126.
h
tt
ps
:
//
do
i
.
o
r
g
/
10.1016
/
j
.e
sw
a.2016.03.
028
[
29
]
Ashish
V
as
w
ani
,
N
oam
S
ha
z
ee
r
,
N
i
k
i
P
a
r
ma
r
,
Ja
k
ob Us
zk
o
r
e
it
,
L
l
ion Jon
e
s
,
Aidan
N
.
G
om
e
z
,
Lu
k
as
z
Kais
e
r
,
and I
ll
ia
P
o
l
osu
k
hin
.
2017.
Att
e
ntion Is A
ll
Y
ou
N
ee
d
.
N
i
ps (
2017
)
.
a
r
X
i
v
:1706.03762
h
tt
p
:
//
a
rx
i
v
.
o
rg
/
a
b
s
/
1706.03762
[
30
]
Cas
e
y
W
hit
e
l
a
w
,
N
a
v
e
ndu
G
a
rg
,
and
S
h
l
omo A
rg
amon
.
2005.
Usin
g
app
r
aisa
l
gr
oups fo
r
s
e
ntim
e
nt ana
ly
sis
.
In Pr
oc
ee
d
ing
s
o
f
t
h
e
14t
h
AC
M in
t
e
rna
t
i
o
na
l
co
nf
e
r
e
n
c
e
o
n
I
nf
o
rma
t
i
o
n an
d
k
n
o
w
l
e
d
g
e
manag
e
m
e
n
t
-
C
I
KM
’05.
ACM
Pr
e
ss
.
h
tt
ps
:
//
do
i
.
o
rg
/
10.1145
/
1099554.1099714
[
31
]
S
t
e
v
e
Y
an
g
,
Jason
R
os
e
nf
e
l
d
,
and Jacqu
e
s Ma
k
utonin
.
2018.
F
inancia
l
Asp
e
ct
-
B
a
s
ed
S
e
n
t
i
m
e
n
t
A
n
a
ly
s
i
s us
i
ng
D
ee
p
R
e
pr
e
s
e
n
tat
i
o
ns
.
(
2018
)
.
a
rX
i
v
:1808.07931
h
tt
ps
:
//
a
rx
i
v
.
o
rg
/
pdf
/
1808.07931
v
1.
pdfh
tt
p
:
//
a
rx
i
v
.
o
rg
/
a
b
s
/
1808.07931
[
32
]
L
e
i
Zh
a
ng
,
Shu
a
i
W
a
ng
,
a
n
d
B
i
ng L
i
u
.
2018
.
D
ee
p
l
ea
rn
i
ng f
o
r s
e
n
t
i
m
e
n
t
a
n
a
ly
s
i
s
:
A
sur
v
e
y
.
W
i
l
e
y
I
n
t
er
d
i
sc
i
p
l
i
nar
y
Rev
i
e
ws
:
D
a
t
a
M
i
n
i
ng an
d
Kn
o
w
l
e
d
ge
D
i
sc
o
ver
y
8,
4
(ma
r
2018
)
,
e1253.
h
tt
ps
:
//
do
i
.
o
rg
/
10.1002
/
w
i
dm
.1253
[
33
]
Y
u
k
un Zhu
,
Ry
an K
i
r
os
,
R
i
cha
r
d Z
e
m
e
l
,
R
us
l
an Sa
l
a
k
hu
t
d
i
no
v
,
R
aqu
e
l
U
r
t
asun
,
A
n
t
on
i
o To
rr
a
l
b
a
,
and San
j
a F
i
d
l
e
r
.
2015.
A
l
i
g
n
i
n
g
B
oo
k
s and Mo
v
i
e
s
:
Towa
r
ds
S
to
r
y
-
l
i
k
e
V
isua
l
E
x
p
l
anations b
y
W
atchin
g
Mo
v
i
e
s and
R
e
adin
g
B
oo
k
s
.
(
j
un
2015
)
.
a
r
X
i
v
:1506.06724
h
tt
p
:
//
a
rx
i
v
.
o
rg
/
a
b
s
/
1506.06724
10